On Mon, Sep 1, 2025, 11:26 PM Rob Freeman <[email protected]> wrote:
> On Mon, Sep 1, 2025 at 11:54 AM Matt Mahoney <[email protected]> > wrote: > >> >> The model representation in memory is several times larger than the input >> > > I just want to emphasize that line. > > What might be the theoretical limit in size, I wonder? Could there be no > limit? > A Hopfield net stores 0.15 bits per connection. A server farm keeps thousands of copies of the Linux kernel in RAM. The human brain stores 10^9 bits using 10^15 synapses. Your body stores 10^13 copies of your DNA. The laws of physics probably have a few hundred bits of Kolmogorov complexity but describe a biosphere with 10^37 bits of DNA in a universe with a storage capacity of 10^90 bits and an entropy (Bekenstein bound of the Hubble radius) of 2.95 x 10^122 bits. So, yes there is a limit unless you include multiverse theories with an infinite number of finite universes and an overall Kolmogorov complexity of 0. But even in our observable universe, there is no computer big enough to simulate it to predict tomorrow's lottery numbers or to test grand unified theories. But we are just talking about testing LLMs using lossless compression, and I need to point out the limitations of this approach. 1. This test only works on deterministic computers, where you can reset to an earlier state and reproduce the same sequence of predictions to decompress a file. This is not possible with human brains. 2. This only works with language. It is good enough for passing the Turing test but it does not work with vision or robotics. The problem with pixel prediction in video is that most of the data is noise that is not perceptible to the eye, but would still have to be compressed. In theory you could compress raw video (10^9 bits per second) to a text description (10 bits per second) and uncompress by asking an AI to generate another video that looks about the same. That would rely on subjective evaluation rather than just comparing files. 3. LLM chatbots output the most likely continuation, which means they can't use the chain rule (p(xy) = p(x)p(y|x)) to predict a bit or token at a time and use it as context for the next prediction. Suppose you have: p(00) = .3 p(01) = .3 p(10) = 0 p(11) = .4 A language model would predict the next bit is 0 with probability .6 even though the correct response is 11. Solving this requires looking ahead and searching over the decision tree. Compression doesn't distinguish between chatbots that do this well vs poorly. 4. Current chatbots have separate training and test phases so that the parameters can be fixed and shared without leaking information between users. Doing this in a compressor would make compression worse. Compressors normally update the model after each prediction. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Ta9b77fda597cc07a-Mab6442134a46b301a1e05467 Delivery options: https://agi.topicbox.com/groups/agi/subscription
