Thinking out loud here as I find the relationship between compression and intelligence interesting:
Compression in itself has the overriding goal of reducing storage bits. Intelligence has coincidental compression. There is resource management there. But I do think that it is not ONLY coincidental. Knowledge has structure which can be organized and naturally can collapse into a lower complexity storage state. Things have order, based on physics and other mathematical relationships. The relationship between compression and stored knowledge and intelligence is intriguing. But knowledge can be compressed inefficiently to where it inhibits extraction and other operations so there are differences with compression and intelligence related to computational expense. Optimal intelligence would have a variational compression structure IOW some stuff needs fast access time with minimal decompression resource expenditure and other stuff has high storage priority but computational expense and access time are not a priority. And then when you say the word compression there is a complicity of utility. The result of a compressor that has general intelligence still has a goal of reducing storage bits. I think that compression can be a byproduct of the stored knowledge created by a general intelligence. But if you have a compressor with general intelligence built in and you assign it a goal of taking input data and reducing the storage space it still may result in a series of hacks because that may be the best way of accomplishing that goal. Sure there may be some new undiscovered hacks that require general intelligence to uncover. And a compressor that is generally intelligent may produce more rich lossily compressed data from varied sources. The best lossy compressor is probably generally intelligent. They are very similar as you indicate... but when you start getting real lossy, when you start asking questions from your lossy compressed data that are not related to just the uncompressed input there is a difference there. Compression itself is just one dimensional. Intelligence is multi. John > -----Original Message----- > From: Matt Mahoney [mailto:[EMAIL PROTECTED] > Sent: Friday, September 05, 2008 6:39 PM > To: agi@v2.listbox.com > Subject: Re: Language modeling (was Re: [agi] draft for comment) > > --- On Fri, 9/5/08, Pei Wang <[EMAIL PROTECTED]> wrote: > > > Like to many existing AI works, my disagreement with you is > > not that > > much on the solution you proposed (I can see the value), > > but on the > > problem you specified as the goal of AI. For example, I > > have no doubt > > about the theoretical and practical values of compression, > > but don't > > think it has much to do with intelligence. > > In http://cs.fit.edu/~mmahoney/compression/rationale.html I explain why > text compression is an AI problem. To summarize, if you know the > probability distribution of text, then you can compute P(A|Q) for any > question Q and answer A to pass the Turing test. Compression allows you > to precisely measure the accuracy of your estimate of P. Compression > (actually, word perplexity) has been used since the early 1990's to > measure the quality of language models for speech recognition, since it > correlates well with word error rate. > > The purpose of this work is not to solve general intelligence, such as > the universal intelligence proposed by Legg and Hutter [1]. That is not > computable, so you have to make some arbitrary choice with regard to > test environments about what problems you are going to solve. I believe > the goal of AGI should be to do useful work for humans, so I am making a > not so arbitrary choice to solve a problem that is central to what most > people regard as useful intelligence. > > I had hoped that my work would lead to an elegant theory of AI, but that > hasn't been the case. Rather, the best compression programs were > developed as a series of thousands of hacks and tweaks, e.g. change a 4 > to a 5 because it gives 0.002% better compression on the benchmark. The > result is an opaque mess. I guess I should have seen it coming, since it > is predicted by information theory (e.g. [2]). > > Nevertheless the architectures of the best text compressors are > consistent with cognitive development models, i.e. phoneme (or letter) > sequences -> lexical -> semantics -> syntax, which are themselves > consistent with layered neural architectures. I already described a > neural semantic model in my last post. I also did work supporting > Hutchens and Alder showing that lexical models can be learned from n- > gram statistics, consistent with the observation that babies learn the > rules for segmenting continuous speech before they learn any words [3]. > > I agree it should also be clear that semantics is learned before > grammar, contrary to the way artificial languages are processed. Grammar > requires semantics, but not the other way around. Search engines work > using semantics only. Yet we cannot parse sentences like "I ate pizza > with Bob", "I ate pizza with pepperoni", "I ate pizza with chopsticks", > without semantics. > > My benchmark does not prove that there aren't better language models, > but it is strong evidence. It represents the work of about 100 > researchers who have tried and failed to find more accurate, faster, or > less memory intensive models. The resource requirements seem to increase > as we go up the chain from n-grams to grammar, contrary to symbolic > approaches. This is my argument why I think AI is bound by lack of > hardware, not lack of theory. > > 1. Legg, Shane, and Marcus Hutter (2006), A Formal Measure of Machine > Intelligence, Proc. Annual machine learning conference of Belgium and > The Netherlands (Benelearn-2006). Ghent, 2006. > http://www.vetta.org/documents/ui_benelearn.pdf > > 2. Legg, Shane, (2006), Is There an Elegant Universal Theory of > Prediction?, Technical Report IDSIA-12-06, IDSIA / USI-SUPSI, Dalle > Molle Institute for Artificial Intelligence, Galleria 2, 6928 Manno, > Switzerland. > http://www.vetta.org/documents/IDSIA-12-06-1.pdf > > 3. M. Mahoney (2000), A Note on Lexical Acquisition in Text without > Spaces, http://cs.fit.edu/~mmahoney/dissertation/lex1.html > > > -- Matt Mahoney, [EMAIL PROTECTED] > > > > ------------------------------------------- > agi > Archives: https://www.listbox.com/member/archive/303/=now > RSS Feed: https://www.listbox.com/member/archive/rss/303/ > Modify Your Subscription: > https://www.listbox.com/member/?& > 3ee90b > Powered by Listbox: http://www.listbox.com ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com