RE: Language modeling (was Re: [agi] draft for comment)

John G. Rose Sat, 06 Sep 2008 07:20:38 -0700

Thinking out loud here as I find the relationship between compression and
intelligence interesting:


Compression in itself has the overriding goal of reducing storage bits.
Intelligence has coincidental compression. There is resource management
there. But I do think that it is not ONLY coincidental. Knowledge has
structure which can be organized and naturally can collapse into a lower
complexity storage state. Things have order, based on physics and other
mathematical relationships. The relationship between compression and stored
knowledge and intelligence is intriguing. But knowledge can be compressed
inefficiently to where it inhibits extraction and other operations so there
are differences with compression and intelligence related to computational
expense. Optimal intelligence would have a variational compression structure
IOW some stuff needs fast access time with minimal decompression resource
expenditure and other stuff has high storage priority but computational
expense and access time are not a priority.

And then when you say the word compression there is a complicity of utility.
The result of a compressor that has general intelligence still has a goal of
reducing storage bits. I think that compression can be a byproduct of the
stored knowledge created by a general intelligence. But if you have a
compressor with general intelligence built in and you assign it a goal of
taking input data and reducing the storage space it still may result in a
series of hacks because that may be the best way of accomplishing that goal.


Sure there may be some new undiscovered hacks that require general
intelligence to uncover. And a compressor that is generally intelligent may
produce more rich lossily compressed data from varied sources. The best
lossy compressor is probably generally intelligent. They are very similar as
you indicate... but when you start getting real lossy, when you start asking
questions from your lossy compressed data that are not related to just the
uncompressed input there is a difference there. Compression itself is just
one dimensional. Intelligence is multi. 

John 



> -----Original Message-----
> From: Matt Mahoney [mailto:[EMAIL PROTECTED]
> Sent: Friday, September 05, 2008 6:39 PM
> To: agi@v2.listbox.com
> Subject: Re: Language modeling (was Re: [agi] draft for comment)
> 
> --- On Fri, 9/5/08, Pei Wang <[EMAIL PROTECTED]> wrote:
> 
> > Like to many existing AI works, my disagreement with you is
> > not that
> > much on the solution you proposed (I can see the value),
> > but on the
> > problem you specified as the goal of AI. For example, I
> > have no doubt
> > about the theoretical and practical values of compression,
> > but don't
> > think it has much to do with intelligence.
> 
> In http://cs.fit.edu/~mmahoney/compression/rationale.html I explain why
> text compression is an AI problem. To summarize, if you know the
> probability distribution of text, then you can compute P(A|Q) for any
> question Q and answer A to pass the Turing test. Compression allows you
> to precisely measure the accuracy of your estimate of P. Compression
> (actually, word perplexity) has been used since the early 1990's to
> measure the quality of language models for speech recognition, since it
> correlates well with word error rate.
> 
> The purpose of this work is not to solve general intelligence, such as
> the universal intelligence proposed by Legg and Hutter [1]. That is not
> computable, so you have to make some arbitrary choice with regard to
> test environments about what problems you are going to solve. I believe
> the goal of AGI should be to do useful work for humans, so I am making a
> not so arbitrary choice to solve a problem that is central to what most
> people regard as useful intelligence.
> 
> I had hoped that my work would lead to an elegant theory of AI, but that
> hasn't been the case. Rather, the best compression programs were
> developed as a series of thousands of hacks and tweaks, e.g. change a 4
> to a 5 because it gives 0.002% better compression on the benchmark. The
> result is an opaque mess. I guess I should have seen it coming, since it
> is predicted by information theory (e.g. [2]).
> 
> Nevertheless the architectures of the best text compressors are
> consistent with cognitive development models, i.e. phoneme (or letter)
> sequences -> lexical -> semantics -> syntax, which are themselves
> consistent with layered neural architectures. I already described a
> neural semantic model in my last post. I also did work supporting
> Hutchens and Alder showing that lexical models can be learned from n-
> gram statistics, consistent with the observation that babies learn the
> rules for segmenting continuous speech before they learn any words [3].
> 
> I agree it should also be clear that semantics is learned before
> grammar, contrary to the way artificial languages are processed. Grammar
> requires semantics, but not the other way around. Search engines work
> using semantics only. Yet we cannot parse sentences like "I ate pizza
> with Bob", "I ate pizza with pepperoni", "I ate pizza with chopsticks",
> without semantics.
> 
> My benchmark does not prove that there aren't better language models,
> but it is strong evidence. It represents the work of about 100
> researchers who have tried and failed to find more accurate, faster, or
> less memory intensive models. The resource requirements seem to increase
> as we go up the chain from n-grams to grammar, contrary to symbolic
> approaches. This is my argument why I think AI is bound by lack of
> hardware, not lack of theory.
> 
> 1. Legg, Shane, and Marcus Hutter (2006), A Formal Measure of Machine
> Intelligence, Proc. Annual machine learning conference of Belgium and
> The Netherlands (Benelearn-2006). Ghent, 2006.
> http://www.vetta.org/documents/ui_benelearn.pdf
> 
> 2. Legg, Shane, (2006), Is There an Elegant Universal Theory of
> Prediction?,  Technical Report IDSIA-12-06, IDSIA / USI-SUPSI, Dalle
> Molle Institute for Artificial Intelligence, Galleria 2, 6928 Manno,
> Switzerland.
> http://www.vetta.org/documents/IDSIA-12-06-1.pdf
> 
> 3. M. Mahoney (2000), A Note on Lexical Acquisition in Text without
> Spaces, http://cs.fit.edu/~mmahoney/dissertation/lex1.html
> 
> 
> -- Matt Mahoney, [EMAIL PROTECTED]
> 
> 
> 
> -------------------------------------------
> agi
> Archives: https://www.listbox.com/member/archive/303/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/303/
> Modify Your Subscription:
> https://www.listbox.com/member/?&;
> 3ee90b
> Powered by Listbox: http://www.listbox.com



-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com

RE: Language modeling (was Re: [agi] draft for comment)

Reply via email to