--- [EMAIL PROTECTED] wrote:

> Relating to the idea that text compression (as demonstrated by general
> compression algorithms) is a measure of intelligence,
> Claims:
> (1) To understand natural language requires knowledge (CONTEXT) of the
> social world(s) it refers to.
> (2) Communication includes (at most) a shadow of the context necessary
> to understand it.
> 
> Given (1), no context-free analysis can understand natural language.
> Given (2), no adaptive agent can learn (proper) understanding of natural
> language given only texts.
> 
> For human-like understanding, an AGI would need to participate in
> (human) social society.

The ideal test set for text compression as a test for AI would be 1 GB of chat
sessions, such as the transcripts between judges and human confederates in the
Loebner contests.  Since I did not have this much data available I used
Wikipedia.  It lacks a discourse model but the problem is otherwise similar in
that good compression requires vast, real world knowledge.  For example,
compressing or predicting:

  Q. What color are roses?
  A. ___

is almost the same kind of problem as compressing or predicting:

  Roses are ___

Of course, the compressor would be learning an ungrounded language model. 
That should be sufficient for passing a Turing test.  A model need not have
actually seen a rose to know the answer to the question.  I don't think it is
possible to find any knowledge that could be tested through a text-only
channel that could not also be learned through a text-only channel.  Whether
sufficient testable knowledge is actually available in a training corpus is
another question.

I don't claim that lossless compression could be used to test for AGI, just
AI.  A lossless image compression test would be almost useless because the
small amount of perceptible information in video would be overwhelmed by
uncompressible pixel noise.  A lossy test would be appropriate, but would
require subjective human evaluation of the quality of the reproduced output. 
For text, a strictly objective lossless test is possible because the
perceptible content of text is a large fraction of the total content.


-- Matt Mahoney, [EMAIL PROTECTED]

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=49471493-636320

Reply via email to