I have been developing an experimental test set along the lines of Legg and 
Hutter's universal intelligence ( 
http://www.idsia.ch/idsiareport/IDSIA-04-05.pdf ). They define general 
intelligence as the expected reward of an AIXI agent in a Solomonoff 
distribution of environments (simulated by random Turing machines). AIXI is 
essentially a compression problem (find the shortest program consistent with 
the interaction so far). Thus, my benchmark is a large number (10^6) of small 
strings (1-32 bytes) generated by random Turing machines. The benchmark is 
here: http://cs.fit.edu/~mmahoney/compression/uiq/

I believe I have solved the technical issues related to experimental 
uncertainty and ensuring the source is cryptographically random. My goal was to 
make it an open benchmark with verifiable results while making it impossible to 
hard-code any knowledge of the test data into the agent. Other benchmarks solve 
this problem by including the decompressor size in the measurement, but my 
approach makes this unnecessary. However, I would appreciate any comments.

A couple of issues arose in designing the benchmark. One is that compression 
results are highly dependent on the choice of universal Turing machine, even 
though all machines are theoretically equivalent. The problem is that even 
though any machine can simulate any other by appending a compiler or 
interpreter, this small constant is significant in practice where the 
complexity of the programs is already small. I tried to create a simple but 
expressive language based on a 2 tape machine (working plus output, both one 
sided and binary) and an instruction set that outputs a bit with each 
instruction. There are, of course, many options. I suppose I could use an 
experimental approach of finding languages that rank compressors in the same 
order as other benchmarks. But there doesn't seem to be a guiding principle.

Also, it does not seem even possible to sample a Solomonoff distribution. Legg 
proved in http://arxiv.org/abs/cs.AI/0606070 that there are strings that are 
hard to learn, but that the time to create them grows as fast as the busy 
beaver problem. Of course I can't create such strings in my benchmark. I can 
create algorithmically complex sources, but they are necessarily easy to learn 
(for example, 100 random bits followed by all zero bits).

Is it possible to test the intelligence of an agent without having at least as 
much computing power? Legg's paper seems to say "no".

-- Matt Mahoney, matmaho...@yahoo.com


-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Reply via email to