On Sun, Aug 31, 2025, 3:16 AM Rob Freeman <[email protected]>
wrote:

> On Sun, Aug 31, 2025 at 9:05 AM Matt Mahoney <[email protected]>
> wrote:
>
>> On Fri, Aug 29, 2025 at 10:45 PM Rob Freeman <[email protected]>
>> wrote:
>> > The contribution of the Hutter Prize to our knowledge has been barren.
>> It didn't find the semantic primitives Hutter envisioned for it.
>>
>> The purpose is not to find semantic primitives. It is to find efficient
>> algorithms for language modeling.
>>
>
> A casual search pulls up a stated goal of "compressing human knowledge".
> Does "compressing human knowledge" differ from finding semantic primitives?
>

Yes. Human knowledge also includes a model of the world as we see it. You
need that to predict text.

Now "compressing human knowledge" is reframed to be "find efficient
> algorithms for language modeling"?
>

The hard part of both problems is text prediction. A compressor assigns
codes of length log 1/p to symbols with probability p. A LLM outputs the
most likely continuation. Both of those steps are easy. The reason that
compression seems unnatural is because human brains can't be reset to an
earlier state to exactly reproduce the same sequence of predictions to
decompress a file.

Also you can't compress from an LLM because you don't get a probability
distribution over all possible responses through the API.

>
> In practice the biggest change to this compression model in 20 years has
> been to allow models to be bigger?
>

The biggest change was PAQ style context mixing replacing PPM at the top of
the benchmarks. PPM predicts a byte at a time instead of bits and is more
memory efficient, but is restricted to contiguous contexts.


> But this is an old argument. Right at the beginning of the Hutter Prize I
> used to argue with you that language models would be characterized by
> getting bigger, not compression. 10 years later we got LLMs. Not known for
> their compactness. But OK, arguments over this go round in circles over
> what all the words "mean". There will be some way to argue an LLM is a
> compression of something, I'm sure. Yeah, they're a compression of what
> they generate, even if they themselves are bigger than what generates
> them...
>

The model representation in memory is several times larger than the input,
but that is not what we care about. Adding memory generally improves speed
and compression in a 3 way tradeoff.

>
> And then the next advance is to go from supervised NNs to unsupervised.
> Less structure, or just less imposed structure?
>

Text prediction is supervised learning with immediate feedback.

Prediction measures intelligence. Compression measures prediction.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta9b77fda597cc07a-M10495e7ca2f24f8ebda49235
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to