Re: [agi] Re: deepmind-co-founder-suggests-new-turing-test-ai-chatbots-report-2023-6

Matt Mahoney Thu, 06 Jul 2023 09:58:54 -0700

On Thu, Jul 6, 2023, 2:09 AM Rob Freeman <chaotic.langu...@gmail.com> wrote:


>
> Did the Hutter Prize move the field? Well, I was drawn to it as a rare
> data based benchmark.
>

Not much. I disagreed with Hutter on the contest rules, but he was funding
the prize. (I once applied for an NSF grant but it was rejected like 90% of
applications). My large text benchmark has no limits on CPU time or memory,
but I understand the need for these when there is prize money. I also don't
independently evaluate every submission. Still, I have collected about 200
entries with thousands of versions and options since 2006. People are
motivated by competition and reputation alone. A top scoring open source
entry can land you a good position as a data scientist. At least it did for
me.

I always believed that neural networks and massive computation are the path
to AGI. My PAQ based program that used neural networks to combine context
models for bitwise prediction was the first to win the Calgary compression
challenge that was not based on PPM (mixing byte predictions based on the
longest context matches). My LTCB benchmark also confirms this. The top
program uses a transformer network that runs for about 2 weeks on 10K CUDA
cores and 32 GB memory.

The Hutter prize hasn't run long enough to make any important advances.
Mostly they are cut down variants of LTCB entries to meet the time and
memory constraints. I'm not sure what is in the current entry other than a
lot of tweaks of the previous entry. That one improved compression by
reordering the Wikipedia articles to move semantically related articles
together, a technique that is more effective when memory is constrained.

I just always believed the goal of compression was wrong.

Compression is only appropriate for evaluating deterministic language
models. Human brains are good at text prediction but can't compress because
we can't precisely reproduce the same sequence of predictions during
decompression. But this isn't a problem for transistor based
implementations.

Also compression cannot be used to evaluate vision, speech, or robotic
models because the information content is dominated by random noise, which
cannot be compressed. Raw video is about 10^9 bits per second, of which
only 5 to 10 bits are cognitively relevant and the rest can be discarded.
You could theoretically write a lossy video compression algorithm that
reduces video to a text based script and then regenerate a new video that
is close enough that you wouldn't notice the difference. But unlike
lossless text compression, you still need human judges to evaluate the
quality of the restored video.

Text prediction is sufficient to pass the Turing test. The model estimates
the distribution of all possible responses to a question and selects the
most likely answer. A text compressor makes the same predictions and
assigns a code of length log 1/p bits to each possible response with
probability p. To decompress, it computes the distribution again and looks
up the code. To evaluate an algorithm, we run the compressed self
extracting archive and compare with the original data. Computing a
distribution is the same as prediction because by the chain rule, the
probability of a string is equal to the product of the conditional
probabilities of each symbol (word, token, byte, or bit) given the previous
symbols.

The LTCB and Hutter prize entries model grammar and semantics to some
extent but never developed to the point of constructing world models
enabling them to reason about physics or psychology or solve novel math and
coding problems. We now know this is possible in larger models without
grounding in nonverbal sensory data, even though we don't understand how it
happened. I suspect it is possible in 1 GB with some pre training or hard
coded knowledge, and hasn't been done up to this point only because we
didn't know it was possible at all.


------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T42db51de471cbcb9-M86421e0a9aa85ba9702cf864
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] Re: deepmind-co-founder-suggests-new-turing-test-ai-chatbots-report-2023-6

Reply via email to