Re: [agi] Introducing Steve's Theory of Everything in cognition.
Steve, Richard is right when he says temporal simultaneity is not a sufficient principle. Suppose you present your system with the following sequences (letters could be substituted for sounds, colors, objects, whatever): ABCABCABCABC... AAABBBAAABBB... ABBAAAABB... ABBCCCEFF... ABACABADABACABAEABACABADABACABA... All of these sequences have concepts behind them. All of these concepts are immune to temporal-simultaneity-learning (although the first could be learned by temporal adjacency, and the second by temporal adjacency with a delay of 3). The transition to sequence learning is (at least, in my eyes) a transition to relational learning, as opposed to the flat learning that PCA is designed for. In other words, completely new methods are required. You already begin that transition by invoking dp/dt, which assumes a temporal aspect to the data... See this blog post for a more full account of my view on the current state of affairs. (It started out as a post about a new algorithm I'd been thinking about, but turned into an essay on the difference between relational methods and flat (propositional) methods, and how to bridge the gap. If you're wondering about the title, see the previous post.) http://dragonlogic-ai.blogspot.com/2008/12/back-to-ai-ok-here-we-go.html --Abram On Fri, Dec 26, 2008 at 2:31 AM, Steve Richfield steve.richfi...@gmail.com wrote: Richard,Richard, On 12/25/08, Richard Loosemore r...@lightlink.com wrote: Steve Richfield wrote: There are doubtless exceptions to my broad statement, but generally, neuron functionality is WIDE open to be pretty much ANYTHING you choose, including that of an AGI engine's functionality on its equations. In the reverse, any NN could be expressed in a shorthand form that contains structure, synapse functions, etc., and an AGI engine could be built/modified to function according to that shorthand. In short, mapping between NN and AGI forms presumes flexibility in the functionality of the target form. Where that flexibility is NOT present, e.g. because of orthogonal structure, etc., then you must ask whether something is being gained or lost by the difference. Clearly, any transition that involves a loss should be carefully examined to see if the entire effort is headed in the wrong direction, which I think was your original point here. There is a problem here. When someone says X and Y can easily be mapped from one form to the other there is an implication that they are NOt suggesting that we go right down to the basic constituents of both X and Y in order to effect the mapping. Thus: Chalk and Cheese can easily be mapped from one to the other trivially true if we are prepared to go down to the common denominator of electrons, protons and neutrons. But if we stay at a sensible level then, no, these do not map onto one another. The problem here is that you were thinking present existing NN and AGI systems, neither of which work (yet) in any really useful way, that it was obviously impossible to directly convert from one system with its set of bad assumptions to another system with a completely different set of bad assumptions. I completely agree, but I assert that the answer to that particular question is of no practical interest to anyone. On the other hand, converting between NN and AGI systems built on the SAME set of assumptions would be simple. This situation doesn't yet exist. Until then, converting a program from one dysfunctional platform to another is uninteresting. When the assumptions get ironed out, then all systems will be built on the same assumptions, and there will be few problems going between them, EXCEPT: Things need to be arranged in arrays for automated learning, which much more fits the present NN paradigm than the present AGI paradigm. Similarly, if you claim that NN and regular AGI map onto one another, I assume that you are saying something more substantial than that these two can both be broken down into their primitive computational parts, and that when this is done they seem equivalent. Even this breakdown isn't required if both systems are built on the same correct assumptions. HOWEVER, I see no way to transfer fast learning from an NN-like construction to an AGI-like construction. Do you? If there is no answer to this question, then this unanswerable question would seem to redirect AGI efforts to NN-like constructions if they are ever to learn like we do. NN and regular AGI, they way they are understood by people who understand them, have very different styles of constructing intelligent systems. Neither of which work (yet). Of course, we are both trying to fill in the gaps. Sure, you can code both in C, or Lisp, or Cobol, but that is to trash the real meaning of are easily mapped onto one another. One of my favorite consulting projects involved coding an AI program to solve complex problems that were roughly
Re: [agi] Universal intelligence test benchmark
2008/12/26 Matt Mahoney matmaho...@yahoo.com: I have updated my universal intelligence test with benchmarks on about 100 compression programs. Humans aren't particularly good at compressing data. Does this mean humans aren't intelligent, or is it a poor definition of intelligence? Although my goal was to sample a Solomonoff distribution to measure universal intelligence (as defined by Hutter and Legg), If I define intelligence as the ability to catch mice, does that mean my cat is more intelligent than most humans? More to the point, I don't understand the point of defining intelligence this way. Care to enlighten me? -- Philip Hunt, cabala...@googlemail.com Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Universal intelligence test benchmark
Philip Hunt wrote: 2008/12/26 Matt Mahoney matmaho...@yahoo.com: I have updated my universal intelligence test with benchmarks on about 100 compression programs. Humans aren't particularly good at compressing data. Does this mean humans aren't intelligent, or is it a poor definition of intelligence? Although my goal was to sample a Solomonoff distribution to measure universal intelligence (as defined by Hutter and Legg), If I define intelligence as the ability to catch mice, does that mean my cat is more intelligent than most humans? More to the point, I don't understand the point of defining intelligence this way. Care to enlighten me? This may or may not help, but in the past I have pursued exactly these questions, only to get such confusing, evasive and circular answers, all of which amounted to nothing meaningful, that eventually I (like many others) have just had to give up and not engage any more. So, the real answers to your questions are that no, compression is an extremely poor definition of intelligence; and yes, defining intelligence to be something completely arbitrary (like the ability to catch mice) is what Hutter and Legg's analyses are all about. Searching for previous posts of mine which mention Hutter, Legg or AIXI will probably turn up a number of lengthy discussion in which I took a deal of trouble to debunk this stuff. Feel free, of course, to make your own attempt to extract some sense from it all, and by all means let me know if you eventually come to a different conclusion. Richard Loosemore --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Universal intelligence test benchmark
I'll try to answer this one... 1) In a nutshell, the algorithmic info. definition of intelligence is like this: Intelligence is the ability of a system to achieve a goal that is randomly selected from the space of all computable goals, according to some defined probability distribution on computable-goal space. 2) Of course, if one had a system that was highly intelligent according to the above definition, it would be a great compressor. 3) There are theorems stating that if you have a great compressor, then by wrapping a little code around it, you can get a system that will be highly intelligent according to the algorithmic info. definition. The catch is that this system (as constructed in the theorems) will use insanely, infeasibly much computational resource. What are the weaknesses of the approach: A) The real problem of AI is to make a system that can achieve complex goals using feasibly much computational resource. B) Workable strategies for achieving complex goals using feasibly much computational resource, may be highly dependent on the particular probability distribution over goal space mentioned in 1 above For this reason, I'm not sure the algorithmic info. approach is of much use for building real AGI systems. I note that Shane Legg is now directing his research toward designing practical AGI systems along totally different lines, not directly based any of the alg. info. stuff he worked on in his thesis. However, Marcus Hutter, Juergen Schmidhuber and others are working on methods of scaling down the approaches mentioned in 3 above (AIXItl, the Godel Machine, etc.) to as to yield feasible techniques. So far this has led to some nice machine learning algorithms (e.g. the parameter-free temporal difference reinforcement learning scheme in part of Legg's thesis, and Hutter's new work on Feature Bayesian Networks and so forth), but nothing particularly AGI-ish. But personally I wouldn't be harshly dismissive of this research direction, even though it's not the one I've chosen. -- Ben G On Fri, Dec 26, 2008 at 3:53 PM, Richard Loosemore r...@lightlink.comwrote: Philip Hunt wrote: 2008/12/26 Matt Mahoney matmaho...@yahoo.com: I have updated my universal intelligence test with benchmarks on about 100 compression programs. Humans aren't particularly good at compressing data. Does this mean humans aren't intelligent, or is it a poor definition of intelligence? Although my goal was to sample a Solomonoff distribution to measure universal intelligence (as defined by Hutter and Legg), If I define intelligence as the ability to catch mice, does that mean my cat is more intelligent than most humans? More to the point, I don't understand the point of defining intelligence this way. Care to enlighten me? This may or may not help, but in the past I have pursued exactly these questions, only to get such confusing, evasive and circular answers, all of which amounted to nothing meaningful, that eventually I (like many others) have just had to give up and not engage any more. So, the real answers to your questions are that no, compression is an extremely poor definition of intelligence; and yes, defining intelligence to be something completely arbitrary (like the ability to catch mice) is what Hutter and Legg's analyses are all about. Searching for previous posts of mine which mention Hutter, Legg or AIXI will probably turn up a number of lengthy discussion in which I took a deal of trouble to debunk this stuff. Feel free, of course, to make your own attempt to extract some sense from it all, and by all means let me know if you eventually come to a different conclusion. Richard Loosemore --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com -- Ben Goertzel, PhD CEO, Novamente LLC and Biomind LLC Director of Research, SIAI b...@goertzel.org I intend to live forever, or die trying. -- Groucho Marx --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
RE: [agi] SyNAPSE might not be a joke ---- was ---- Building a machine that can learn from experience
Richard, Since you are clearly in the mode you routinely get into when you start loosing an argument on this list --- as has happened so many times before --- i.e., of ceasing all further productive communication on the actual subject of the argument --- this will be my last communication with you on this subject --- that is --- unless you actually come of up with some reasonable support for your brash statement that the central core of Tonini's paper, which I cited, was nonsense. I have better things to do than get into extended arguments with people who are as intellectually dishonest as you become when you start loosing an argument. Ed Porter -Original Message- From: Richard Loosemore [mailto:r...@lightlink.com] Sent: Friday, December 26, 2008 1:03 AM To: agi@v2.listbox.com Subject: Re: [agi] SyNAPSE might not be a joke was Building a machine that can learn from experience Ed Porter wrote: Why is it that people who repeatedly and insultingly say other people's work or ideas are total nonsense -- without any reasonable justification -- are still allowed to participate in the discussion on the AGI list? Because they know what they are talking about. And because they got that way by having a low tolerance for fools, nonsense and people who can't tell the difference between the critique of an idea and a personal insult. ;-) Richard Loosemore --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Introducing Steve's Theory of Everything in cognition.
Steve, It is strange to claim that prior PhDs will be worthless when what you are suggesting is that we apply the standard methods to a different representation. But that is beside the present point. :) Taking the derivative, or just finite differences, is a useful step in more ways then one. You are talking about taking differences over time, but differences over space can used for edge detection, frequently thought of as the first step in visual processing. More generally, any transform that makes the data more sparse, or simpler, seems good-- which is of course what PCA does, and derivatives in time/space, and also the fourier transform I think. The usefulness of these transforms springs from underlying regularities in the data. That's not to say that I don't think some representations are fundamentally more useful than others-- for example, I know that some proofs are astronomically larger in 1st-order logic as compared to 2nd-order logic, even in domains where 1st-order logic is representationally sufficient. The statement about time correction reminds me of a system called PURR-PUSS. It is turing-complete in some sense, essentially by compounding time-delays, but I do not know exactly what sense (ie, a turing complete *learner* is very different then a turing-complete *programmable computer*... PURR PUSS uses something inbetween called soft teaching if I recall correctly.) --Abram On Fri, Dec 26, 2008 at 3:26 PM, Steve Richfield steve.richfi...@gmail.com wrote: Abram, On 12/26/08, Abram Demski abramdem...@gmail.com wrote: Steve, Richard is right when he says temporal simultaneity is not a sufficient principle. ... and I fully agree. However, we must unfold this thing one piece at a time. Without the dp/dt trick, there doesn't seem to be any way to make unsupervised learning work, and I appear to be the first to stumble onto dp/dt. This is a whole new and unexplored world, where the things that stymied past unsupervised efforts fall out effortlessly, but some new challenges present themselves. Suppose you present your system with the following sequences (letters could be substituted for sounds, colors, objects, whatever): ABCABCABCABC... AAABBBAAABBB... ABBAAAABB... ABBCCCEFF... ABACABADABACABAEABACABADABACABA... All of these sequences have concepts behind them. All of these concepts are immune to temporal-simultaneity-learning (although the first could be learned by temporal adjacency, and the second by temporal adjacency with a delay of 3). The way that wet neurons are built, this is unavoidable! Here is another snippet from my paper... Time Correction Electronics designers routinely use differentiation and integration to advance and retard timing. Phase-linear low-pass filters are often used to make short delays in a signal, and peaking capacitors were used in RTL (Resistor Transistor Logic) to differentiate inputs for quicker output. Further, wet neurons introduce their own propagation delays from input synapse to output synapse. If not somehow corrected, the net effect of this is a scrambling of the time that a given signal/node/term represents, which if left uncorrected, would result in relating signals together that are arbitrarily shifted in time. There seems to be three schools of thought regarding this: No problem. This simply results in considering various things shifted arbitrarily in time. When wet neurons learn what works, this will result in recognizing time-sequenced phenomena. Arbitrary delays might also do a lot for artificial neurons. Time correction could be instituted, e.g. through Taylor series signal extrapolation to in effect remove a neuron's delay, at the cost of introducing considerable noise into the result. My own simulations of Taylor series extrapolation functions showed that the first derivative may indeed help for small corrections, but beyond that, subtle changes in the shape of a transition cause wild changes in the extrapolated result, sometimes going so far as to produce short bursts of oscillation. Downstream neurons may then amplify these problems to produce havoc at the output of the artificial neural network. The method utilized in CRAY computers might be in use, where all delays were a precise multiple (of their clock rate) long. This was achieved by using interconnecting wires cut to certain specific lengths, even though the length may be much longer than actually physically needed to interconnect two components. Perhaps wet neurons only come in certain very specific delays. There is some laboratory evidence for this, as each section of our brains has neurons with similar geometry within the group. This has been presumed to be an artifact of evolution and limited DNA space, but may in fact be necessary for proper time correction. No one now knows which of these are in use in wet neurons. However, regardless of wet-neuron functionality, artificial
RE: [agi] Universal intelligence test benchmark
From: Matt Mahoney [mailto:matmaho...@yahoo.com] --- On Fri, 12/26/08, Philip Hunt cabala...@googlemail.com wrote: Humans aren't particularly good at compressing data. Does this mean humans aren't intelligent, or is it a poor definition of intelligence? Humans are very good at predicting sequences of symbols, e.g. the next word in a text stream. However, humans are not very good at resetting their mental states and deterministically reproducing the exact sequence of learning steps and assignment of probabilities, which is what you need to decompress the data. Fortunately this is not a problem for computers. Human memory storage may be lossy compression and recall may be decompression. Some very rare individuals remember every day of their life in vivid detail, not sure what that means in terms of memory storage. How does consciousness fit into your compression intelligence modeling? The thing about the word compression is that it is bass-ackwards when talking about intelligence. The word describes kind of an external effect, instead of an internal reconfiguration/re-representation. Also there is a difference between a goal of achieving maximum compression verses a goal of achieving a high efficiency data description. Max compression implies hacks, kludges and a large decompressor. Here is a simple example of human memory compression/decompression - When you think of space, air or emptiness like driving across Kansas, looking at the moon, or waiting idly over a period of time, do you store the emptiness and redundantness or does it get compressed out? The trip across Kansas you remember the starting point, rest stops, and the end, not the full duration. It's a natural compression. In fact I'd say this is a partially lossless compression though more lossy... maybe it is incidental but it is still there. John --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Universal intelligence test benchmark
Most compression tests are like defining intelligence as the ability to catch mice. They measure the ability of compressors to compress specific files. This tends to lead to hacks that are tuned to the benchmarks. For the generic intelligence test, all you know about the source is that it has a Solomonoff distribution (for a particular machine). I don't know how you could make the test any more generic. IMO the test is *too* generic ... I don't think real-world AGI is mainly about being able to recognize totally general patterns in totally general datasets. I suspect that to do that, the best approach is ultimately going to be some AIXItl variant ... meaning it's a problem that's not really solvable using a real-world amount of resources. I suspect that all the AGI system one can really build are SO BAD at this general problem, that it's better to characterize AGI systems -- NOT in terms of how well they do at this general problem but rather -- in terms of what classes of datasets/environments they are REALLY GOOD at recognizing patterns in I think the environments existing in the real physical and social world are drawn from a pretty specific probability distribution (compared to say, the universal prior), and that for this reason, looking at problems of compression or pattern recognition across general program spaces without real-world-oriented biases, is not going to lead to real-world AGI. The important parts of AGI design are the ones that (directly or indirectly) reflect the specific distribution of problems that the reeal world presents an AGI system. And this distribution is **really hard** to encapsulate in a text compression database. Because, we don't know what this distribution is. And this is why we should be working on AGI systems that interact with the real physical and social world, or the most accurate simulations of it we can build. -- Ben G --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Universal intelligence test benchmark
2008/12/26 Matt Mahoney matmaho...@yahoo.com: Humans are very good at predicting sequences of symbols, e.g. the next word in a text stream. Why not have that as your problem domain, instead of text compression? Most compression tests are like defining intelligence as the ability to catch mice. They measure the ability of compressors to compress specific files. This tends to lead to hacks that are tuned to the benchmarks. For the generic intelligence test, all you know about the source is that it has a Solomonoff distribution (for a particular machine). I don't know how you could make the test any more generic. It seems to me that you and Hutter are interested in a problem domain that consists of: 1. generating random turing machines 2. running them to produce output 3. feeding the output as input to another program P, which will then guess future characters based on previous ones 4. having P use these guesses to do compression May I suggest that instead you modify this problem domain by: (a) remove clause 1 -- it's not fundamentally interesting that output comes from a turing machine. Maybe instead make output come from a program (written by humans and interesting to humans) in a normal programming language that people would actually use to write code in (b) remove clause 4 -- compression is a bit of a red herring here, what's important is to predict future output based on past output. IMO if you made these changes, your problem domain would be a more useful one. While you're at it you may want to change the size of the chunks in each item of prediction, from characters to either strings or s-expressions. Though doing so doesn't fundamentally alter the problem. -- Philip Hunt, cabala...@googlemail.com Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Universal intelligence test benchmark
2008/12/27 Ben Goertzel b...@goertzel.org: And this is why we should be working on AGI systems that interact with the real physical and social world, or the most accurate simulations of it we can build. Or some other domain that may have some practical use, e.g. understanding program source code. -- Philip Hunt, cabala...@googlemail.com Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: Real-world vs. universal prior (was Re: [agi] Universal intelligence test benchmark)
Suppose I take the universal prior and condition it on some real-world training data. For example, if you're interested in real-world vision, take 1000 frames of real video, and then the proposed probability distribution is the portion of the universal prior that explains the real video. (I can mathematically define this if there is interest, but I'm guessing the other people here can too, so maybe we can skip that. Speak up if I'm being too unclear.) Do you think the result is different in an important way from the real-world probability distribution you're looking for? -- Tim Freeman http://www.fungible.com t...@fungible.com No, I think that in principle that's the right approach ... but that simple, artificial exercises like conditioning data on photos don't come close to capturing the richness of statistical structure in the physical universe ... or in the subsets of the physical universe that humans typically deal with... ben --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Introducing Steve's Theory of Everything in cognition.
Much of AI and pretty much all of AGI is built on the proposition that we humans must code knowledge because the stupid machines can't efficiently learn it on their own, in short, that UNsupervised learning is difficult. No, in fact almost **no** AGI is based on this proposition. Cyc is based strictly on this proposition ... some other GOFAI-ish systems like SOAR are based on weaker forms of this proposition ... but this is really a minority view in the AGI world, and a view taken by very few designs created in the last decade ... sociologically, it seems to be a view that peaked in the 70's and 80's... -- Ben G --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Introducing Steve's Theory of Everything in cognition.
Steve, When I made the statement about Fourier I was thinking of JPEG encoding. A little digging found this book, which presents a unified approach to (low-level) computer vision based on the Fourier transform: http://books.google.com/books?id=1wJuTMbNT0MCdq=fourier+visionprintsec=frontcoversource=blots=3ogSJ2i5uWsig=ZdvvWvu82q8UX1c5Abq6hWvgZCYhl=ensa=Xoi=book_resultresnum=2ct=result#PPA4,M1 But that is beside the present point. :) Probably so. I noticed that you recently graduated, so I thought that I would drop that thought to make (or unmake) your day. :) I should really update that. It's been a while now. generally, any transform that makes the data more sparse, or simpler, seems good Certainly if it results in extracting some useful of merit. -- which is of course what PCA does, Sometimes yes, and sometimes no. I am looking at incremental PCA approaches that reliably extract separate figures of merit rather than smushed-together figures of merit as PCA often does. How do you define figures of merit? Sounds like an ill-defined problem to me. We don't know which features we *really* want to extract from an image until we know the utility function of the environment, and so know what information will help us achieve our goals. --Abram On Sat, Dec 27, 2008 at 12:01 AM, Steve Richfield steve.richfi...@gmail.com wrote: Abram, On 12/26/08, Abram Demski abramdem...@gmail.com wrote: Steve, It is strange to claim that prior PhDs will be worthless when what you are suggesting is that we apply the standard methods to a different representation. Much of AI and pretty much all of AGI is built on the proposition that we humans must code knowledge because the stupid machines can't efficiently learn it on their own, in short, that UNsupervised learning is difficult. Note that in nature, UNsupervised learning handily outperforms supervised learning. What good is supervised NN technology when UNsupervised NNs will perform MUCH better? What good are a few hand-coded AGI rules and the engine that runs them, when an UNsupervised AGI can learn them orders of magnitude faster than cities full of programmers? Note my prior post where I explain that either AGIs must either abandon UNsuperised learning, or switch to a NN-like implementation. In short, easy UNsupervised learning will change things about as much as the switch from horse and buggy to automobiles, leaving present PhDs in the position of blacksmiths and historians. Sure blacksmiths had transferrable skills, but they weren't worth much and they weren't respected at all. In the 1980s, countless top computer people (including myself) had to expunge all references to mainframe computers from our resumes in order to find work in a microcomputer-dominated field. I expect to see rounds of the same sort of insanity when UNsupervised learning emerges. But that is beside the present point. :) Probably so. I noticed that you recently graduated, so I thought that I would drop that thought to make (or unmake) your day. Taking the derivative, or just finite differences, is a useful step in more ways then one. You are talking about taking differences over time, but differences over space can used for edge detection, frequently thought of as the first step in visual processing. Correct. My paper goes into using any dimension that is differentiable. Note that continuous eye movement converts a physical dimension to time domain. More generally, any transform that makes the data more sparse, or simpler, seems good Certainly if it results in extracting some useful of merit. -- which is of course what PCA does, Sometimes yes, and sometimes no. I am looking at incremental PCA approaches that reliably extract separate figures of merit rather than smushed-together figures of merit as PCA often does. Another problem with classical PCA is that it can't provide real-time learning, but instead, works via a sort of batch processing of statistics collected in the array that is being transformed. and derivatives in time/space, and also the fourier transform I think. The usefulness of these transforms springs from underlying regularities in the data. Hmmm, I don't see where a Fourier transform would enter the cognitive process. Perhaps you see something that I have missed? That's not to say that I don't think some representations are fundamentally more useful than others-- for example, I know that some proofs are astronomically larger in 1st-order logic as compared to 2nd-order logic, even in domains where 1st-order logic is representationally sufficient. The statement about time correction reminds me of a system called PURR-PUSS. However, as I understand it, the Purposeful Unprimed Real-world Robot with Predictors Using Short Segments still relied on rewards and punishments for learning. It is turing-complete in some sense, essentially by compounding time-delays, but I do not
RE: [agi] Universal intelligence test benchmark
--- On Fri, 12/26/08, John G. Rose johnr...@polyplexic.com wrote: Human memory storage may be lossy compression and recall may be decompression. Some very rare individuals remember every day of their life in vivid detail, not sure what that means in terms of memory storage. Human perception is a form of lossy compression which has nothing to do with the lossless compression that I use to measure prediction accuracy. Many lossless compressors use lossy filters too. A simple example is an order-n context where we discard everything except the last n symbols. How does consciousness fit into your compression intelligence modeling? It doesn't. Why is consciousness important? Max compression implies hacks, kludges and a large decompressor. As I discovered with the large text benchmark. -- Matt Mahoney, matmaho...@yahoo.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Spatial indexing (was Re: [agi] Universal intelligence test benchmark)
--- On Fri, 12/26/08, J. Andrew Rogers and...@ceruleansystems.com wrote: For example, there is no general indexing algorithm described in computer science. Which was my thesis topic and is the basis of my AGI design. http://www.mattmahoney.net/agi2.html (I wanted to do my dissertation on AI/compression, but funding issues got in the way). Distributed indexing is critical to an AGI design consisting of a huge number of relatively dumb specialists and an infrastructure for getting messages to the right ones. In my thesis, I proposed a vector space model where messages are routed in O(n) time over n nodes. The problem is that the number of connections per node has to be on the order of the number of dimensions in the search space. For text, that is about 10^5. There are many other issues, of course, such as fault tolerance, security and ownership issues. There has to be an economic incentive to contribute knowledge and computing resources, because it is too expensive for anyone to own it. The human genome size has no meaningful relationship to the complexity of coding AGI. Yes it does. It is an upper bound on the complexity of a baby. -- Matt Mahoney, matmaho...@yahoo.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Universal intelligence test benchmark
--- On Fri, 12/26/08, Ben Goertzel b...@goertzel.org wrote: IMO the test is *too* genericĀ ... Hopefully this work will lead to general principles of learning and prediction that could be combined with more specific techniques. For example, a common way to compress text is to encode it with one symbol per word and feed the result to a general purpose compressor. Generic compression should improve the back end. My concern is the data is not generic enough. A string has an algorithmic complexity that is independent of language up to a small constant, but in practice that constant (the algorithmic complexity of the compiler) can be much larger than the string. I have not been able to find a good solution to this problem. I realize there are some very simple, Turing-complete systems, such as a 2 state machine with a 3 symbol alphabet, and a 6 state binary machine, as well as various cellular automata (like rule 110). The problem is that programming simple machines often requires long programs to do simple things. For example, it is difficult to find a simple language where the smallest program to output 100 zero bits is shorter than 100 bits. Existing languages and instruction sets tend to be complex and ad-hoc in order to allow programmers to be expressive. -- Matt Mahoney, matmaho...@yahoo.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Universal intelligence test benchmark
--- On Fri, 12/26/08, Philip Hunt cabala...@googlemail.com wrote: Humans are very good at predicting sequences of symbols, e.g. the next word in a text stream. Why not have that as your problem domain, instead of text compression? That's the same thing, isn't it? While you're at it you may want to change the size of the chunks in each item of prediction, from characters to either strings or s-expressions. Though doing so doesn't fundamentally alter the problem. In the generic test, the fundamental units are bits. It's not entirely suitable for most existing compressors, which tend to be byte oriented. But they are only byte oriented because a lot of data is structured that way. In general, it doesn't need to be. -- Matt Mahoney, matmaho...@yahoo.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: Spatial indexing (was Re: [agi] Universal intelligence test benchmark)
--- On Sat, 12/27/08, Matt Mahoney matmaho...@yahoo.com wrote: In my thesis, I proposed a vector space model where messages are routed in O(n) time over n nodes. Oops, O(log n). -- Matt Mahoney, matmaho...@yahoo.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Introducing Steve's Theory of Everything in cognition.
On Fri, Dec 26, 2008 at 11:56 PM, Abram Demski abramdem...@gmail.com wrote: That's not to say that I don't think some representations are fundamentally more useful than others-- for example, I know that some proofs are astronomically larger in 1st-order logic as compared to 2nd-order logic, even in domains where 1st-order logic is representationally sufficient. Do you have any online references handy for these? One of the things I'm still trying to figure out is to just what extent it is necessary to go to higher-order logic to make interesting statements about program code, and this sounds like useful data. --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
RE: [agi] Universal intelligence test benchmark
From: Matt Mahoney [mailto:matmaho...@yahoo.com] How does consciousness fit into your compression intelligence modeling? It doesn't. Why is consciousness important? I was just prodding you on this. Many people on this list talk about the requirements of consciousness for AGI and I was imagining some sort of consciousness in one of your command line compressors :) I've yet to grasp the relationship between intelligence and consciousness though lately I think consciousness may be more of an evolutionary social thing. Home grown digital intelligence, since it is a loner, may not require much consciousness IMO.. Max compression implies hacks, kludges and a large decompressor. As I discovered with the large text benchmark. Yep and the behavior of the metrics near max theoretical compression is erratic I think? john --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com