I have been developing an experimental test set along the lines of Legg and Hutter's universal intelligence ( http://www.idsia.ch/idsiareport/IDSIA-04-05.pdf ). They define general intelligence as the expected reward of an AIXI agent in a Solomonoff distribution of environments (simulated by random Turing machines). AIXI is essentially a compression problem (find the shortest program consistent with the interaction so far). Thus, my benchmark is a large number (10^6) of small strings (1-32 bytes) generated by random Turing machines. The benchmark is here: http://cs.fit.edu/~mmahoney/compression/uiq/
I believe I have solved the technical issues related to experimental uncertainty and ensuring the source is cryptographically random. My goal was to make it an open benchmark with verifiable results while making it impossible to hard-code any knowledge of the test data into the agent. Other benchmarks solve this problem by including the decompressor size in the measurement, but my approach makes this unnecessary. However, I would appreciate any comments. A couple of issues arose in designing the benchmark. One is that compression results are highly dependent on the choice of universal Turing machine, even though all machines are theoretically equivalent. The problem is that even though any machine can simulate any other by appending a compiler or interpreter, this small constant is significant in practice where the complexity of the programs is already small. I tried to create a simple but expressive language based on a 2 tape machine (working plus output, both one sided and binary) and an instruction set that outputs a bit with each instruction. There are, of course, many options. I suppose I could use an experimental approach of finding languages that rank compressors in the same order as other benchmarks. But there doesn't seem to be a guiding principle. Also, it does not seem even possible to sample a Solomonoff distribution. Legg proved in http://arxiv.org/abs/cs.AI/0606070 that there are strings that are hard to learn, but that the time to create them grows as fast as the busy beaver problem. Of course I can't create such strings in my benchmark. I can create algorithmically complex sources, but they are necessarily easy to learn (for example, 100 random bits followed by all zero bits). Is it possible to test the intelligence of an agent without having at least as much computing power? Legg's paper seems to say "no". -- Matt Mahoney, matmaho...@yahoo.com ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com