On Fri, Aug 29, 2025 at 10:45 PM Rob Freeman <[email protected]> wrote: > The contribution of the Hutter Prize to our knowledge has been barren. It didn't find the semantic primitives Hutter envisioned for it.
The purpose is not to find semantic primitives. It is to find efficient algorithms for language modeling. Recall that intelligence (defined by Turing) requires nothing more than text prediction, and that compression measures prediction. It turns out that the top compressors use neural networks (specifically a transformer) to predict tokens (semantic primitives). The reason there are two contests using the same 1 GB text file is that Hutter and I disagreed on the contest rules. My benchmark has no limits on CPU time or memory (and no prize money), while his has to run in 50 hours on a single CPU thread with 10 GB of memory and no GPU. That is woefully insufficient for practical LLMs, but we didn't know that in 2006 when I created the benchmark. But we fairly quickly saw that more computing power always helps, with apparently no upper limit. Most of the Hutter prize winners are incremental improvements to CMIX to optimize for time and memory to meet the contest rules. CMIX uses dictionary tokenization followed by PAQ style context modeling, where multiple indirect context model bitwise predictions are averaged using single layer neural networks trained to select the best models. Indirect context models map a context hash to a bit history (8 bit state) and then to a prediction. The predictions are averaged in the logistic domain, x = ln p/ln (1-p), then the weighted sum is squashed by the inverse function p = 1/(1 + e^-wx) where w is the weight vector. Later improvements replaced some of the context models with PPM, which predicts at the byte level and saves memory. Then later versions sort the articles by topic to improve compression. This is why we need both benchmarks. One for biologically plausibile models for big LLMs, and one for your phone. -- Matt Mahoney, [email protected] ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Ta9b77fda597cc07a-Mb1052b23789ae3e6098ce724 Delivery options: https://agi.topicbox.com/groups/agi/subscription
