On Fri, Aug 29, 2025 at 10:45 PM Rob Freeman <[email protected]>
wrote:
> The contribution of the Hutter Prize to our knowledge has been barren. It
didn't find the semantic primitives Hutter envisioned for it.

The purpose is not to find semantic primitives. It is to find efficient
algorithms for language modeling. Recall that intelligence (defined by
Turing) requires nothing more than text prediction, and that compression
measures prediction. It turns out that the top compressors use neural
networks (specifically a transformer) to predict tokens (semantic
primitives).

The reason there are two contests using the same 1 GB text file is that
Hutter and I disagreed on the contest rules. My benchmark has no limits on
CPU time or memory (and no prize money), while his has to run in 50 hours
on a single CPU thread with 10 GB of memory and no GPU. That is woefully
insufficient for practical LLMs, but we didn't know that in 2006 when I
created the benchmark. But we fairly quickly saw that more computing power
always helps, with apparently no upper limit.

Most of the Hutter prize winners are incremental improvements to CMIX to
optimize for time and memory to meet the contest rules. CMIX uses
dictionary tokenization followed by PAQ style context modeling, where
multiple indirect context model bitwise predictions are averaged using
single layer neural networks trained to select the best models. Indirect
context models map a context hash to a bit history (8 bit state) and then
to a prediction. The predictions are averaged in the logistic domain, x =
ln p/ln (1-p), then the weighted sum is squashed by the inverse function p
= 1/(1 + e^-wx) where w is the weight vector. Later improvements replaced
some of the context models with PPM, which predicts at the byte level and
saves memory. Then later versions sort the articles by topic to improve
compression.

This is why we need both benchmarks. One for biologically plausibile models
for big LLMs, and one for your phone.

-- Matt Mahoney, [email protected]

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta9b77fda597cc07a-Mb1052b23789ae3e6098ce724
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to