> Thanks for taking the time to explain your ideas in detail.
> As I said,
> our different opinions on how to do AI come from our very
> different
> understanding of "intelligence". I don't take
> "passing Turing Test" as
> my research goal (as explained in
> http://nars.wang.googlepages.com/wang.logic_intelligence.pdf
> and
> http://nars.wang.googlepages.com/wang.AI_Definitions.pdf). 
> I disagree
> with Hutter's approach, not because his SOLUTION is not
> computable,
> but because his PROBLEM is too idealized and simplified to
> be relevant
> to the actual problems of AI.

I don't advocate the Turing test as the ideal test of intelligence. Turing 
himself was aware of the problem when he gave an example of a computer 
answering an arithmetic problem incorrectly in his famous 1950 paper:

Q: Please write me a sonnet on the subject of the Forth Bridge.
A: Count me out on this one. I never could write poetry.
Q: Add 34957 to 70764.
A: (Pause about 30 seconds and then give as answer) 105621.
Q: Do you play chess?
A: Yes.
Q: I have K at my K1, and no other pieces.  You have only K at K6 and R at R1.  
It is your move.  What do you play?
A: (After a pause of 15 seconds) R-R8 mate.

I prefer a "preference test", which a machine passes if you prefer to talk to 
it over a human. Such a machine would be too fast and make too few errors to 
pass a Turing test. For example, if you had to add two large numbers, I think 
you would prefer to use a calculator than ask someone. You could, I suppose, 
measure intelligence as the fraction of questions for which the machine gives 
the preferred answer, which would be 1/4 in Turing's example.

If you know the probability distribution P of text, and therefore know the 
distribution P(A|Q) for any question Q and answer A, then to pass the Turing 
test you would randomly choose answers from this distribution. But to pass the 
preference test for all Q, you would choose A that maximizes P(A|Q) because the 
most probable answer is usually the correct one. Text compression measures 
progress toward either test.

I believe that compression measures your definition of intelligence, i.e. 
adaptation given insufficient knowledge and resources. In my benchmark, there 
are two parts: the size of the decompression program, which measures the 
initial knowledge, and the compressed size, which measures prediction errors 
that occur as the system adapts. Programs must also meet practical time and 
memory constraints to be listed in most benchmarks.

Compression is also consistent with Legg and Hutter's universal intelligence, 
i.e. expected reward of an AIXI universal agent in an environment simulated by 
a random program. Suppose you have a compression oracle that inputs any string 
x and outputs the shortest program that outputs a string with prefix x. Then 
this reduces the (uncomputable) AIXI problem to using the oracle to guess which 
environment is consistent with the interaction so far, and figuring out which 
future outputs by the agent will maximize reward.

Of course universal intelligence is also not testable because it requires an 
infinite number of environments. Instead, we have to choose a practical data 
set. I use Wikipedia text, which has fewer errors than average text, but I 
believe that is consistent with my goal of passing the preference test.

-- Matt Mahoney, [EMAIL PROTECTED]

