On Mon, Sep 1, 2025, 11:26 PM Rob Freeman <[email protected]>
wrote:

> On Mon, Sep 1, 2025 at 11:54 AM Matt Mahoney <[email protected]>
> wrote:
>
>>
>> The model representation in memory is several times larger than the input
>>
>
> I just want to emphasize that line.
>
> What might be the theoretical limit in size, I wonder? Could there be no
> limit?
>

A Hopfield net stores 0.15 bits per connection. A server farm keeps
thousands of copies of the Linux kernel in RAM. The human brain stores 10^9
bits using 10^15 synapses. Your body stores 10^13 copies of your DNA. The
laws of physics probably have a few hundred bits of Kolmogorov complexity
but describe a biosphere with 10^37 bits of DNA in a universe with a
storage capacity of 10^90 bits and an entropy (Bekenstein bound of the
Hubble radius) of 2.95 x 10^122 bits.

So, yes there is a limit unless you include multiverse theories with an
infinite number of finite universes and an overall Kolmogorov complexity of
0. But even in our observable universe, there is no computer big enough to
simulate it to predict tomorrow's lottery numbers or to test grand unified
theories.

But we are just talking about testing LLMs using lossless compression, and
I need to point out the limitations of this approach.

1. This test only works on deterministic computers, where you can reset to
an earlier state and reproduce the same sequence of predictions to
decompress a file. This is not possible with human brains.

2. This only works with language. It is good enough for passing the Turing
test but it does not work with vision or robotics. The problem with pixel
prediction in video is that most of the data is noise that is not
perceptible to the eye, but would still have to be compressed. In theory
you could compress raw video (10^9 bits per second) to a text description
(10 bits per second) and uncompress by asking an AI to generate another
video that looks about the same. That would rely on subjective evaluation
rather than just comparing files.

3. LLM chatbots output the most likely continuation, which means they can't
use the chain rule (p(xy) = p(x)p(y|x)) to predict a bit or token at a time
and use it as context for the next prediction. Suppose you have:

p(00) = .3
p(01) = .3
p(10) = 0
p(11) = .4

A language model would predict the next bit is 0 with probability .6 even
though the correct response is 11. Solving this requires looking ahead and
searching over the decision tree. Compression doesn't distinguish between
chatbots that do this well vs poorly.

4. Current chatbots have separate training and test phases so that the
parameters can be fixed and shared without leaking information between
users. Doing this in a compressor would make compression worse. Compressors
normally update the model after each prediction.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta9b77fda597cc07a-Mab6442134a46b301a1e05467
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to