date:20081226

Re: [agi] Introducing Steve's Theory of Everything in cognition.

2008-12-26 Thread Abram Demski

Steve,

Richard is right when he says temporal simultaneity is not a
sufficient principle. Suppose you present your system with the
following sequences (letters could be substituted for sounds, colors,
objects, whatever):

ABCABCABCABC...

AAABBBAAABBB...

ABBAAAABB...

ABBCCCEFF...

ABACABADABACABAEABACABADABACABA...

All of these sequences have concepts behind them. All of these
concepts are immune to temporal-simultaneity-learning (although the
first could be learned by temporal adjacency, and the second by
temporal adjacency with a delay of 3).

The transition to sequence learning is (at least, in my eyes) a
transition to relational learning, as opposed to the flat learning
that PCA is designed for. In other words, completely new methods are
required. You already begin that transition by invoking dp/dt, which
assumes a temporal aspect to the data...

See this blog post for a more full account of my view on the current
state of affairs. (It started out as a post about a new algorithm I'd
been thinking about, but turned into an essay on the difference
between relational methods and flat (propositional) methods, and how
to bridge the gap. If you're wondering about the title, see the
previous post.)

http://dragonlogic-ai.blogspot.com/2008/12/back-to-ai-ok-here-we-go.html

--Abram

On Fri, Dec 26, 2008 at 2:31 AM, Steve Richfield
steve.richfi...@gmail.com wrote:
 Richard,Richard,

 On 12/25/08, Richard Loosemore r...@lightlink.com wrote:

 Steve Richfield wrote:

  There are doubtless exceptions to my broad statement, but generally,
 neuron functionality is WIDE open to be pretty much ANYTHING you choose,
 including that of an AGI engine's functionality on its equations.
  In the reverse, any NN could be expressed in a shorthand form that
 contains structure, synapse functions, etc., and an AGI engine could be
 built/modified to function according to that shorthand.
  In short, mapping between NN and AGI forms presumes flexibility in the
 functionality of the target form. Where that flexibility is NOT present,
 e.g. because of orthogonal structure, etc., then you must ask whether
 something is being gained or lost by the difference. Clearly, any transition
 that involves a loss should be carefully examined to see if the entire
 effort is headed in the wrong direction, which I think was your original
 point here.


 There is a problem here.

 When someone says X and Y can easily be mapped from one form to the
 other there is an implication that they are NOt suggesting that we go right
 down to the basic constituents of both X and Y in order to effect the
 mapping.

 Thus:  Chalk and Cheese can easily be mapped from one to the other 
 trivially true if we are prepared to go down to the common denominator of
 electrons, protons and neutrons.  But if we stay at a sensible level then,
 no, these do not map onto one another.


 The problem here is that you were thinking present existing NN and AGI
 systems, neither of which work (yet) in any really useful way, that it was
 obviously impossible to directly convert from one system with its set of bad
 assumptions to another system with a completely different set of bad
 assumptions. I completely agree, but I assert that the answer to that
 particular question is of no practical interest to anyone.

 On the other hand, converting between NN and AGI systems built on the SAME
 set of assumptions would be simple. This situation doesn't yet exist. Until
 then, converting a program from one dysfunctional platform to another is
 uninteresting. When the assumptions get ironed out, then all systems will be
 built on the same assumptions, and there will be few problems going between
 them, EXCEPT:

 Things need to be arranged in arrays for automated learning, which much more
 fits the present NN paradigm than the present AGI paradigm.

 Similarly, if you claim that NN and regular AGI map onto one another, I
 assume that you are saying something more substantial than that these two
 can both be broken down into their primitive computational parts, and that
 when this is done they seem equivalent.


 Even this breakdown isn't required if both systems are built on the same
 correct assumptions. HOWEVER, I see no way to transfer fast learning from an
 NN-like construction to an AGI-like construction. Do you? If there is no
 answer to this question, then this unanswerable question would seem to
 redirect AGI efforts to NN-like constructions if they are ever to learn like
 we do.

 NN and regular AGI, they way they are understood by people who understand
 them, have very different styles of constructing intelligent systems.


 Neither of which work (yet). Of course, we are both trying to fill in the
 gaps.

 Sure, you can code both in C, or Lisp, or Cobol, but that is to trash the
 real meaning of are easily mapped onto one another.


 One of my favorite consulting projects involved coding an AI program to
 solve complex problems that were roughly

Re: [agi] Universal intelligence test benchmark

2008-12-26 Thread Philip Hunt

2008/12/26 Matt Mahoney matmaho...@yahoo.com:
 I have updated my universal intelligence test with benchmarks on about 100 
 compression programs.

Humans aren't particularly good at compressing data. Does this mean
humans aren't intelligent, or is it a poor definition of intelligence?

 Although my goal was to sample a Solomonoff distribution to measure universal
 intelligence (as defined by Hutter and Legg),

If I define intelligence as the ability to catch mice, does that mean
my cat is more intelligent than most humans?

More to the point, I don't understand the point of defining
intelligence this way. Care to enlighten me?

-- 
Philip Hunt, cabala...@googlemail.com
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Universal intelligence test benchmark

2008-12-26 Thread Richard Loosemore


Philip Hunt wrote:

2008/12/26 Matt Mahoney matmaho...@yahoo.com:

I have updated my universal intelligence test with benchmarks on about 100 
compression programs.


Humans aren't particularly good at compressing data. Does this mean
humans aren't intelligent, or is it a poor definition of intelligence?


Although my goal was to sample a Solomonoff distribution to measure universal
intelligence (as defined by Hutter and Legg),


If I define intelligence as the ability to catch mice, does that mean
my cat is more intelligent than most humans?

More to the point, I don't understand the point of defining
intelligence this way. Care to enlighten me?



This may or may not help, but in the past I have pursued exactly these 
questions, only to get such confusing, evasive and circular answers, all 
of which amounted to nothing meaningful, that eventually I (like many 
others) have just had to give up and not engage any more.


So, the real answers to your questions are that no, compression is an 
extremely poor definition of intelligence; and yes, defining 
intelligence to be something completely arbitrary (like the ability to 
catch mice) is what Hutter and Legg's analyses are all about.


Searching for previous posts of mine which mention Hutter, Legg or AIXI 
will probably turn up a number of lengthy discussion in which I took a 
deal of trouble to debunk this stuff.


Feel free, of course, to make your own attempt to extract some sense 
from it all, and by all means let me know if you eventually come to a 
different conclusion.





Richard Loosemore



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Universal intelligence test benchmark

2008-12-26 Thread Ben Goertzel

I'll try to answer this one...

1)
In a nutshell, the algorithmic info. definition of intelligence is like
this: Intelligence is the ability of a system to achieve a goal that is
randomly selected from the space of all computable goals, according to some
defined probability distribution on computable-goal space.

2)
Of course, if one had a system that was highly intelligent according to the
above definition, it would be a great compressor.

3)
There are theorems stating that if you have a great compressor, then by
wrapping a little code around it, you can get a system that will be highly
intelligent according to the algorithmic info. definition.  The catch is
that this system (as constructed in the theorems) will use insanely,
infeasibly much computational resource.

What are the weaknesses of the approach:

A)
The real problem of AI is to make a system that can achieve complex goals
using feasibly much computational resource.

B)
Workable strategies for achieving complex goals using feasibly much
computational resource, may be highly dependent on the particular
probability distribution over goal space mentioned in 1 above

For this reason, I'm not sure the algorithmic info. approach is of much use
for building real AGI systems.

I note that Shane Legg is now directing his research toward designing
practical AGI systems along totally different lines, not directly based any
of the alg. info. stuff he worked on in his thesis.

However, Marcus Hutter, Juergen Schmidhuber and others are working on
methods of scaling down the approaches mentioned in 3 above (AIXItl, the
Godel Machine, etc.) to as to yield feasible techniques.  So far this has
led to some nice machine learning algorithms (e.g. the parameter-free
temporal difference reinforcement learning scheme in part of Legg's thesis,
and Hutter's new work on Feature Bayesian Networks and so forth), but
nothing particularly AGI-ish.  But personally I wouldn't be harshly
dismissive of this research direction, even though it's not the one I've
chosen.

-- Ben G




On Fri, Dec 26, 2008 at 3:53 PM, Richard Loosemore r...@lightlink.comwrote:

 Philip Hunt wrote:

 2008/12/26 Matt Mahoney matmaho...@yahoo.com:

 I have updated my universal intelligence test with benchmarks on about
 100 compression programs.


 Humans aren't particularly good at compressing data. Does this mean
 humans aren't intelligent, or is it a poor definition of intelligence?

  Although my goal was to sample a Solomonoff distribution to measure
 universal
 intelligence (as defined by Hutter and Legg),


 If I define intelligence as the ability to catch mice, does that mean
 my cat is more intelligent than most humans?

 More to the point, I don't understand the point of defining
 intelligence this way. Care to enlighten me?


 This may or may not help, but in the past I have pursued exactly these
 questions, only to get such confusing, evasive and circular answers, all of
 which amounted to nothing meaningful, that eventually I (like many others)
 have just had to give up and not engage any more.

 So, the real answers to your questions are that no, compression is an
 extremely poor definition of intelligence; and yes, defining intelligence to
 be something completely arbitrary (like the ability to catch mice) is what
 Hutter and Legg's analyses are all about.

 Searching for previous posts of mine which mention Hutter, Legg or AIXI
 will probably turn up a number of lengthy discussion in which I took a deal
 of trouble to debunk this stuff.

 Feel free, of course, to make your own attempt to extract some sense from
 it all, and by all means let me know if you eventually come to a different
 conclusion.




 Richard Loosemore




 ---
 agi
 Archives: https://www.listbox.com/member/archive/303/=now
 RSS Feed: https://www.listbox.com/member/archive/rss/303/
 Modify Your Subscription:
 https://www.listbox.com/member/?;
 Powered by Listbox: http://www.listbox.com




-- 
Ben Goertzel, PhD
CEO, Novamente LLC and Biomind LLC
Director of Research, SIAI
b...@goertzel.org

I intend to live forever, or die trying.
-- Groucho Marx



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

RE: [agi] SyNAPSE might not be a joke ---- was ---- Building a machine that can learn from experience

2008-12-26 Thread Ed Porter

Richard, 

 

Since you are clearly in the mode you routinely get into when you start
loosing an argument on this list --- as has happened so many times before
--- i.e., of ceasing all further productive communication on the actual
subject of the argument --- this will be my last communication with you on
this subject --- that is --- unless you actually come of up with some
reasonable support for your brash statement that the central core of
Tonini's paper, which I cited, was nonsense.

 

I have better things to do than get into extended arguments with people who
are as intellectually dishonest as you become when you start loosing an
argument.

 

Ed Porter

 

 

-Original Message-
From: Richard Loosemore [mailto:r...@lightlink.com] 
Sent: Friday, December 26, 2008 1:03 AM
To: agi@v2.listbox.com
Subject: Re: [agi] SyNAPSE might not be a joke  was  Building a
machine that can learn from experience

 

Ed Porter wrote:

 Why is it that people who repeatedly and insultingly say other people's 

 work or ideas are total nonsense -- without any reasonable 

 justification -- are still allowed to participate in the discussion on 

 the AGI list?

 

Because they know what they are talking about.

 

And because they got that way by having a low tolerance for fools, 

nonsense and people who can't tell the difference between the critique 

of an idea and a personal insult.

 

;-)

 

 

 

 

Richard Loosemore

 

 

 

---

agi

Archives: https://www.listbox.com/member/archive/303/=now

RSS Feed: https://www.listbox.com/member/archive/rss/303/

Modify Your Subscription:
https://www.listbox.com/member/?;

Powered by Listbox: http://www.listbox.com




---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Introducing Steve's Theory of Everything in cognition.

2008-12-26 Thread Abram Demski

Steve,

It is strange to claim that prior PhDs will be worthless when what you
are suggesting is that we apply the standard methods to a different
representation. But that is beside the present point. :)

Taking the derivative, or just finite differences, is a useful step in
more ways then one. You are talking about taking differences over
time, but differences over space can used for edge detection,
frequently thought of as the first step in visual processing. More
generally, any transform that makes the data more sparse, or simpler,
seems good-- which is of course what PCA does, and derivatives in
time/space, and also the fourier transform I think. The usefulness of
these transforms springs from underlying regularities in the data.

That's not to say that I don't think some representations are
fundamentally more useful than others-- for example, I know that some
proofs are astronomically larger in 1st-order logic as compared to
2nd-order logic, even in domains where 1st-order logic is
representationally sufficient.

The statement about time correction reminds me of a system called
PURR-PUSS. It is turing-complete in some sense, essentially by
compounding time-delays, but I do not know exactly what sense (ie, a
turing complete *learner* is very different then a turing-complete
*programmable computer*... PURR PUSS uses something inbetween called
soft teaching if I recall correctly.)

--Abram

On Fri, Dec 26, 2008 at 3:26 PM, Steve Richfield
steve.richfi...@gmail.com wrote:
 Abram,

 On 12/26/08, Abram Demski abramdem...@gmail.com wrote:

 Steve,

 Richard is right when he says temporal simultaneity is not a
 sufficient principle.


 ... and I fully agree. However, we must unfold this thing one piece at a
 time.

 Without the dp/dt trick, there doesn't seem to be any way to make
 unsupervised learning work, and I appear to be the first to stumble onto
 dp/dt. This is a whole new and unexplored world, where the things that
 stymied past unsupervised efforts fall out effortlessly, but some new
 challenges present themselves.


 Suppose you present your system with the
 following sequences (letters could be substituted for sounds, colors,
 objects, whatever):

 ABCABCABCABC...

 AAABBBAAABBB...

 ABBAAAABB...

 ABBCCCEFF...

 ABACABADABACABAEABACABADABACABA...

 All of these sequences have concepts behind them. All of these
 concepts are immune to temporal-simultaneity-learning (although the
 first could be learned by temporal adjacency, and the second by
 temporal adjacency with a delay of 3).


 The way that wet neurons are built, this is unavoidable! Here is another
 snippet from my paper...


 Time Correction

 Electronics designers routinely use differentiation and integration to
 advance and retard timing. Phase-linear low-pass filters are often used to
 make short delays in a signal, and peaking capacitors were used in RTL
 (Resistor Transistor Logic) to differentiate inputs for quicker output.
 Further, wet neurons introduce their own propagation delays from input
 synapse to output synapse. If not somehow corrected, the net effect of this
 is a scrambling of the time that a given signal/node/term represents, which
 if left uncorrected, would result in relating signals together that are
 arbitrarily shifted in time. There seems to be three schools of thought
 regarding this:

 No problem. This simply results in considering various things shifted
 arbitrarily in time. When wet neurons learn what works, this will result in
 recognizing time-sequenced phenomena. Arbitrary delays might also do a lot
 for artificial neurons.

 Time correction could be instituted, e.g. through Taylor series signal
 extrapolation to in effect remove a neuron's delay, at the cost of
 introducing considerable noise into the result. My own simulations of Taylor
 series extrapolation functions showed that the first derivative may indeed
 help for small corrections, but beyond that, subtle changes in the shape of
 a transition cause wild changes in the extrapolated result, sometimes going
 so far as to produce short bursts of oscillation. Downstream neurons may
 then amplify these problems to produce havoc at the output of the artificial
 neural network.

 The method utilized in CRAY computers might be in use, where all delays were
 a precise multiple (of their clock rate) long. This was achieved by using
 interconnecting wires cut to certain specific lengths, even though the
 length may be much longer than actually physically needed to interconnect
 two components. Perhaps wet neurons only come in certain very specific
 delays. There is some laboratory evidence for this, as each section of our
 brains has neurons with similar geometry within the group. This has been
 presumed to be an artifact of evolution and limited DNA space, but may in
 fact be necessary for proper time correction.

 No one now knows which of these are in use in wet neurons. However,
 regardless of wet-neuron functionality, artificial

RE: [agi] Universal intelligence test benchmark

2008-12-26 Thread John G. Rose

 From: Matt Mahoney [mailto:matmaho...@yahoo.com]
 
 --- On Fri, 12/26/08, Philip Hunt cabala...@googlemail.com wrote:
 
  Humans aren't particularly good at compressing data. Does this mean
  humans aren't intelligent, or is it a poor definition of
 intelligence?
 
 Humans are very good at predicting sequences of symbols, e.g. the next
 word in a text stream. However, humans are not very good at resetting
 their mental states and deterministically reproducing the exact
 sequence of learning steps and assignment of probabilities, which is
 what you need to decompress the data. Fortunately this is not a problem
 for computers.
 

Human memory storage may be lossy compression and recall may be
decompression. Some very rare individuals remember every day of their life
in vivid detail, not sure what that means in terms of memory storage.

How does consciousness fit into your compression intelligence modeling?

The thing about the word compression is that it is bass-ackwards when
talking about intelligence. The word describes kind of an external effect,
instead of an internal reconfiguration/re-representation. Also there is a
difference between a goal of achieving maximum compression verses a goal of
achieving a high efficiency data description. Max compression implies hacks,
kludges and a large decompressor. 

Here is a simple example of human memory compression/decompression - When
you think of space, air or emptiness like driving across Kansas, looking at
the moon, or waiting idly over a period of time, do you store the emptiness
and redundantness or does it get compressed out? The trip across Kansas you
remember the starting point, rest stops, and the end, not the full duration.
It's a natural compression. In fact I'd say this is a partially lossless
compression though more lossy... maybe it is incidental but it is still
there.

John



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Universal intelligence test benchmark

2008-12-26 Thread Ben Goertzel

 Most compression tests are like defining intelligence as the ability to
 catch mice. They measure the ability of compressors to compress specific
 files. This tends to lead to hacks that are tuned to the benchmarks. For the
 generic intelligence test, all you know about the source is that it has a
 Solomonoff distribution (for a particular machine). I don't know how you
 could make the test any more generic.


IMO the test is *too* generic  ... I don't think real-world AGI is mainly
about being able to recognize totally general patterns in totally general
datasets.   I suspect that to do that, the best approach is ultimately going
to be some AIXItl variant ... meaning it's a problem that's not really
solvable using a real-world amount of resources.  I suspect that all the AGI
system one can really build are SO BAD at this general problem, that it's
better to characterize AGI systems

-- NOT in terms of how well they do at this general problem

but rather

-- in terms of what classes of datasets/environments they are REALLY GOOD at
recognizing patterns in

I think the environments existing in the real physical and social world are
drawn from a pretty specific probability distribution (compared to say, the
universal prior), and that for this reason, looking at problems of
compression or pattern recognition across general program spaces without
real-world-oriented biases, is not going to lead to real-world AGI.  The
important parts of AGI design are the ones that (directly or indirectly)
reflect the specific distribution of problems that the reeal world presents
an AGI system.

And this distribution is **really hard** to encapsulate in a text
compression database.  Because, we don't know what this distribution is.

And this is why we should be working on AGI systems that interact with the
real physical and social world, or the most accurate simulations of it we
can build.

-- Ben G



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Universal intelligence test benchmark

2008-12-26 Thread Philip Hunt

2008/12/26 Matt Mahoney matmaho...@yahoo.com:

 Humans are very good at predicting sequences of symbols, e.g. the next word 
 in a text stream.

Why not have that as your problem domain, instead of text compression?


 Most compression tests are like defining intelligence as the ability to catch 
 mice. They measure the ability of compressors to compress specific files. 
 This tends to lead to hacks that are tuned to the benchmarks. For the generic 
 intelligence test, all you know about the source is that it has a Solomonoff 
 distribution (for a particular machine). I don't know how you could make the 
 test any more generic.

It seems to me that you and Hutter are interested in a problem domain
that consists of:

1. generating random turing machines

2. running them to produce output

3. feeding the output as input to another program P, which will then
guess future characters based on previous ones

4. having P use these guesses to do compression

May I suggest that instead you modify this problem domain by:

(a) remove clause 1 -- it's not fundamentally interesting that output
comes from a turing machine. Maybe instead make output come from a
program (written by humans and interesting to humans) in a normal
programming language that people would actually use to write code in

(b) remove clause 4 -- compression is a bit of a red herring here,
what's important is to predict future output based on past output.

IMO if you made these changes, your problem domain would be a more useful one.

While you're at it you may want to change the size of the chunks in
each item of prediction, from characters to either strings or
s-expressions. Though doing so doesn't fundamentally alter the
problem.

-- 
Philip Hunt, cabala...@googlemail.com
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Universal intelligence test benchmark

2008-12-26 Thread Philip Hunt

2008/12/27 Ben Goertzel b...@goertzel.org:

 And this is why we should be working on AGI systems that interact with the
 real physical and social world, or the most accurate simulations of it we
 can build.

Or some other domain that may have some practical use, e.g.
understanding program source code.

-- 
Philip Hunt, cabala...@googlemail.com
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: Real-world vs. universal prior (was Re: [agi] Universal intelligence test benchmark)

2008-12-26 Thread Ben Goertzel


 Suppose I take the universal prior and condition it on some real-world
 training data.  For example, if you're interested in real-world
 vision, take 1000 frames of real video, and then the proposed
 probability distribution is the portion of the universal prior that
 explains the real video.  (I can mathematically define this if there
 is interest, but I'm guessing the other people here can too, so maybe
 we can skip that.  Speak up if I'm being too unclear.)

 Do you think the result is different in an important way from the
 real-world probability distribution you're looking for?
 --
 Tim Freeman   http://www.fungible.com
 t...@fungible.com


No, I think that in principle that's the right approach ... but that simple,
artificial exercises like conditioning data on photos don't come close to
capturing the richness of statistical structure in the physical universe ...
or in the subsets of the physical universe that humans typically deal
with...

ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Introducing Steve's Theory of Everything in cognition.

2008-12-26 Thread Ben Goertzel

 Much of AI and pretty much all of AGI is built on the proposition that we
 humans must code knowledge because the stupid machines can't efficiently
 learn it on their own, in short, that UNsupervised learning is difficult.


No, in fact almost **no** AGI is based on this proposition.

Cyc is based strictly on this proposition ... some other GOFAI-ish systems
like SOAR are based on weaker forms of this proposition ... but this is
really a minority view in the AGI world, and a view taken by very few
designs created in the last decade ... sociologically, it seems to be a view
that peaked in the 70's and 80's...

-- Ben G



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Introducing Steve's Theory of Everything in cognition.

2008-12-26 Thread Abram Demski

Steve,

When I made the statement about Fourier I was thinking of JPEG
encoding. A little digging found this book, which presents a unified
approach to (low-level) computer vision based on the Fourier
transform:

http://books.google.com/books?id=1wJuTMbNT0MCdq=fourier+visionprintsec=frontcoversource=blots=3ogSJ2i5uWsig=ZdvvWvu82q8UX1c5Abq6hWvgZCYhl=ensa=Xoi=book_resultresnum=2ct=result#PPA4,M1

But that is beside the present point. :)

Probably so. I noticed that you recently graduated, so I thought that I
would drop that thought to make (or unmake) your day.

:) I should really update that. It's been a while now.

generally, any transform that makes the data more sparse, or simpler,
seems good

Certainly if it results in extracting some useful of merit.

-- which is of course what PCA does,

Sometimes yes, and sometimes no. I am looking at incremental PCA approaches
that reliably extract separate figures of merit rather than smushed-together
figures of merit as PCA often does.

How do you define figures of merit? Sounds like an ill-defined
problem to me. We don't know which features we *really* want to
extract from an image until we know the utility function of the
environment, and so know what information will help us achieve our
goals.

--Abram

On Sat, Dec 27, 2008 at 12:01 AM, Steve Richfield
steve.richfi...@gmail.com wrote:
Abram,

On 12/26/08, Abram Demski abramdem...@gmail.com wrote:

Steve,

It is strange to claim that prior PhDs will be worthless when what you
are suggesting is that we apply the standard methods to a different
representation.

Much of AI and pretty much all of AGI is built on the proposition that we
humans must code knowledge because the stupid machines can't efficiently
learn it on their own, in short, that UNsupervised learning is difficult.
Note that in nature, UNsupervised learning handily outperforms supervised
learning. What good is supervised NN technology when UNsupervised NNs will
perform MUCH better? What good are a few hand-coded AGI rules and the engine
that runs them, when an UNsupervised AGI can learn them orders of magnitude
faster than cities full of programmers? Note my prior post where I explain
that either AGIs must either abandon UNsuperised learning, or switch to a
NN-like implementation. In short, easy UNsupervised learning will change
things about as much as the switch from horse and buggy to automobiles,
leaving present PhDs in the position of blacksmiths and historians. Sure
blacksmiths had transferrable skills, but they weren't worth much and they
weren't respected at all.

In the 1980s, countless top computer people (including myself) had to
expunge all references to mainframe computers from our resumes in order to
find work in a microcomputer-dominated field. I expect to see rounds of the
same sort of insanity when UNsupervised learning emerges.

But that is beside the present point. :)

Probably so. I noticed that you recently graduated, so I thought that I
would drop that thought to make (or unmake) your day.

Taking the derivative, or just finite differences, is a useful step in
more ways then one. You are talking about taking differences over
time, but differences over space can used for edge detection,
frequently thought of as the first step in visual processing.

Correct. My paper goes into using any dimension that is differentiable. Note
that continuous eye movement converts a physical dimension to time domain.

More
generally, any transform that makes the data more sparse, or simpler,
seems good

Certainly if it results in extracting some useful of merit.

-- which is of course what PCA does,

Sometimes yes, and sometimes no. I am looking at incremental PCA approaches
that reliably extract separate figures of merit rather than smushed-together
figures of merit as PCA often does. Another problem with classical PCA is
that it can't provide real-time learning, but instead, works via a sort of
batch processing of statistics collected in the array that is being
transformed.

and derivatives in
time/space, and also the fourier transform I think. The usefulness of
these transforms springs from underlying regularities in the data.

Hmmm, I don't see where a Fourier transform would enter the cognitive
process. Perhaps you see something that I have missed?

That's not to say that I don't think some representations are
fundamentally more useful than others-- for example, I know that some
proofs are astronomically larger in 1st-order logic as compared to
2nd-order logic, even in domains where 1st-order logic is
representationally sufficient.

The statement about time correction reminds me of a system called
PURR-PUSS.

However, as I understand it, the Purposeful Unprimed Real-world Robot with
Predictors Using Short Segments still relied on rewards and punishments for
learning.

It is turing-complete in some sense, essentially by
compounding time-delays, but I do not

RE: [agi] Universal intelligence test benchmark

2008-12-26 Thread Matt Mahoney

--- On Fri, 12/26/08, John G. Rose johnr...@polyplexic.com wrote:

 Human memory storage may be lossy compression and recall may be
 decompression. Some very rare individuals remember every
 day of their life
 in vivid detail, not sure what that means in terms of
 memory storage.

Human perception is a form of lossy compression which has nothing to do with 
the lossless compression that I use to measure prediction accuracy. Many 
lossless compressors use lossy filters too. A simple example is an order-n 
context where we discard everything except the last n symbols.

 How does consciousness fit into your compression
 intelligence modeling?

It doesn't. Why is consciousness important?

 Max compression implies hacks, kludges and a large decompressor. 

As I discovered with the large text benchmark.

-- Matt Mahoney, matmaho...@yahoo.com



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Spatial indexing (was Re: [agi] Universal intelligence test benchmark)

2008-12-26 Thread Matt Mahoney

--- On Fri, 12/26/08, J. Andrew Rogers and...@ceruleansystems.com wrote:

 For example, there is no general indexing algorithm
 described in computer science.

Which was my thesis topic and is the basis of my AGI design.
http://www.mattmahoney.net/agi2.html

(I wanted to do my dissertation on AI/compression, but funding issues got in 
the way).

Distributed indexing is critical to an AGI design consisting of a huge number 
of relatively dumb specialists and an infrastructure for getting messages to 
the right ones. In my thesis, I proposed a vector space model where messages 
are routed in O(n) time over n nodes. The problem is that the number of 
connections per node has to be on the order of the number of dimensions in the 
search space. For text, that is about 10^5.

There are many other issues, of course, such as fault tolerance, security and 
ownership issues. There has to be an economic incentive to contribute knowledge 
and computing resources, because it is too expensive for anyone to own it.

 The human genome size has no meaningful relationship to the
 complexity of coding AGI.

Yes it does. It is an upper bound on the complexity of a baby.

-- Matt Mahoney, matmaho...@yahoo.com



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Universal intelligence test benchmark

2008-12-26 Thread Matt Mahoney

--- On Fri, 12/26/08, Ben Goertzel b...@goertzel.org wrote:

 IMO the test is *too* generic  ...

Hopefully this work will lead to general principles of learning and prediction 
that could be combined with more specific techniques. For example, a common way 
to compress text is to encode it with one symbol per word and feed the result 
to a general purpose compressor. Generic compression should improve the back 
end.

My concern is the data is not generic enough. A string has an algorithmic 
complexity that is independent of language up to a small constant, but in 
practice that constant (the algorithmic complexity of the compiler) can be much 
larger than the string. I have not been able to find a good solution to this 
problem. I realize there are some very simple, Turing-complete systems, such as 
a 2 state machine with a 3 symbol alphabet, and a 6 state binary machine, as 
well as various cellular automata (like rule 110). The problem is that 
programming simple machines often requires long programs to do simple things. 
For example, it is difficult to find a simple language where the smallest 
program to output 100 zero bits is shorter than 100 bits. Existing languages 
and instruction sets tend to be complex and ad-hoc in order to allow 
programmers to be expressive.

-- Matt Mahoney, matmaho...@yahoo.com



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Universal intelligence test benchmark

2008-12-26 Thread Matt Mahoney

--- On Fri, 12/26/08, Philip Hunt cabala...@googlemail.com wrote:

  Humans are very good at predicting sequences of
  symbols, e.g. the next word in a text stream.
 
 Why not have that as your problem domain, instead of text
 compression?

That's the same thing, isn't it?

 While you're at it you may want to change the size of the chunks in
 each item of prediction, from characters to either strings or
 s-expressions. Though doing so doesn't fundamentally alter the
 problem.

In the generic test, the fundamental units are bits. It's not entirely suitable 
for most existing compressors, which tend to be byte oriented. But they are 
only byte oriented because a lot of data is structured that way. In general, it 
doesn't need to be.

-- Matt Mahoney, matmaho...@yahoo.com



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: Spatial indexing (was Re: [agi] Universal intelligence test benchmark)

2008-12-26 Thread Matt Mahoney

--- On Sat, 12/27/08, Matt Mahoney matmaho...@yahoo.com wrote:

 In my thesis, I proposed a vector space model where
 messages are routed in O(n) time over n nodes.

Oops, O(log n).

-- Matt Mahoney, matmaho...@yahoo.com



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Introducing Steve's Theory of Everything in cognition.

2008-12-26 Thread Russell Wallace

On Fri, Dec 26, 2008 at 11:56 PM, Abram Demski abramdem...@gmail.com wrote:
 That's not to say that I don't think some representations are
 fundamentally more useful than others-- for example, I know that some
 proofs are astronomically larger in 1st-order logic as compared to
 2nd-order logic, even in domains where 1st-order logic is
 representationally sufficient.

Do you have any online references handy for these? One of the things
I'm still trying to figure out is to just what extent it is necessary
to go to higher-order logic to make interesting statements about
program code, and this sounds like useful data.


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

RE: [agi] Universal intelligence test benchmark

2008-12-26 Thread John G. Rose

 From: Matt Mahoney [mailto:matmaho...@yahoo.com]
 
  How does consciousness fit into your compression
  intelligence modeling?
 
 It doesn't. Why is consciousness important?
 

I was just prodding you on this. Many people on this list talk about the
requirements of consciousness for AGI and I was imagining some sort of
consciousness in one of your command line compressors :) I've yet to grasp
the relationship between intelligence and consciousness though lately I
think consciousness may be more of an evolutionary social thing. Home grown
digital intelligence, since it is a loner, may not require much
consciousness IMO..

  Max compression implies hacks, kludges and a large decompressor.
 
 As I discovered with the large text benchmark.
 

Yep and the behavior of the metrics near max theoretical compression is
erratic I think?

john



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Introducing Steve's Theory of Everything in cognition.

Re: [agi] Universal intelligence test benchmark

Re: [agi] Universal intelligence test benchmark

Re: [agi] Universal intelligence test benchmark

RE: [agi] SyNAPSE might not be a joke ---- was ---- Building a machine that can learn from experience

Re: [agi] Introducing Steve's Theory of Everything in cognition.

RE: [agi] Universal intelligence test benchmark

Re: [agi] Universal intelligence test benchmark

Re: [agi] Universal intelligence test benchmark

Re: [agi] Universal intelligence test benchmark

Re: Real-world vs. universal prior (was Re: [agi] Universal intelligence test benchmark)

Re: [agi] Introducing Steve's Theory of Everything in cognition.

Re: [agi] Introducing Steve's Theory of Everything in cognition.

RE: [agi] Universal intelligence test benchmark

Spatial indexing (was Re: [agi] Universal intelligence test benchmark)

Re: [agi] Universal intelligence test benchmark

Re: [agi] Universal intelligence test benchmark

Re: Spatial indexing (was Re: [agi] Universal intelligence test benchmark)

Re: [agi] Introducing Steve's Theory of Everything in cognition.

RE: [agi] Universal intelligence test benchmark

20 matches

Site Navigation

Mail list logo

Footer information