Re: [agi] Preprocessor for Hutter prize

Quan Tesla Mon, 19 Jan 2026 11:58:44 -0800

Not for any prize, but noteworthy. The protoscientific BNUT's (a nonlocal
unified theory) axiomatic foundations, as equations and derivations,
compresses to 316 bits. Actually, 312 bits, arbitrarily padded to 316 bits.
Why?Mathematically, it generally gits better overall.


When compared to other unified theorems, such as 'String Theory', this
screams "Impossible!". Yet provably, it does.

There's a point in there to be noted. Turing isn't the benchmark anymore.
As quantum theory matured, Turing has been surpassed. Why try and force 1
more horsepower from an outdated engine? Redesign the engine to fit in with
where the engine world's heading to.

Science does indeed progress at its own pace. Turing was a genius pioneer,
not the ultimate standard. I doubt he would've thought otherwise.

Question is, are we ready for the quantum revolution about to hit us? Well,
it is.



On Fri, 16 Jan 2026, 05:24 Matt Mahoney, <[email protected]> wrote:

> In other news, my Hutter prize preprocessor plus a custom ZPAQ model
> compresses enwik9 from 1000 MB to 145 MB in 13 minutes using 4.5 GB of
> memory, which places it near the Pareto frontier on the large text
> benchmark.
> https://encode.su/threads/4467-enwik9-preprocessor#post86938
>
> There is only a minor change to the preprocessor in step C. The steps are.
> A - article sorting by topic.
> B - basic XML decoding to extract text and headers into separate streams.
> C - capitalization and space modeling and escape coding of rare
> characters. The idea is to split the stream into tokens with independent
> semantics. Capital letters are coded as a special character followed by
> lower case. Then the first letter after a space is coded as upper case and
> the space is removed.
> D - dictionary encoding. Each of 256 byte values decodes to a common group
> of letters found by byte pair encoding, restricted to parts of a word,
> single digit, common punctuation, space (not all are removed), or newline.
> This finds common suffixes like -s, -ed, -ing, etc., which are tokens in
> themselves.
>
> These steps reduce enwik9 from 1000 MB to 580 MB in about 2 minutes, which
> speeds up the downstream context model and reduces memory usage. The output
> is then compressed with zpaqd, a ZPAQ development tool that I wrote in 2014
> that takes a config file that describes the context mixing architecture and
> code in ZPAQL, a virtual machine byte code, to generate the contexts. I
> wrote an order 0-1-2-3-4-6 byte ICM-ISSE chain, order 0-1 word chain, match
> model, and a final order 0 mixer, whose output is arithmetic coded.
>
> An ICM is an indirect context model. It maps a context to an 8 bit state
> representing a count of 0s and 1s and the most recent bit seen in that
> context. That state is mapped to a table of predictions that is updated to
> reduce the prediction error by 0.1%. An order n context means the last n
> whole bytes plus the bits coded so far in the current byte.
>
> An ISSE is an indirect secondary symbol estimator. It mixes the stretched
> previous prediction from the next lower order context with the constant 1
> by weighted averaging and squashes the output to a prediction in the range
> 0 to 1. The weight is selected by a hash of the current context and is
> adjusted to reduce the prediction error by 0.1%. A prediction is stretched
> by x = ln(p/(1-p)) and squashed by the inverse, p = 1/(1 + e^-x). This
> makes a mixer a neural network with no hidden layer. In a word chain, the
> context is a hash of the previous word (for order 1) and the partial word
> bits coded so far, skipping any non letters.
>
> A match model searches for earlier long context matches using a hash table
> and predicts whatever bit came next, weighted by the length of the match.
>
> A mixer is a 2 layer neural network (no hidden layer) that weights the
> stretched predictions from all the other components and outputs the
> squashed weighted sum as the final bit prediction. The weights are updated
> to reduce the prediction error. In an order 0 mixer, the weight vector is
> selected by the order 0 context including the partial current byte.
>
> The current leader on the large text benchmark is nncp, at 110 MB in 60
> hours on an RTX-3090 GPU using a transformer network. The Hutter prize
> winner is fx2-cmix at 113 MB including the decompressor executable, limited
> to 70 hours and 10 GB in a single thread with no GPU. It is a context
> mixing algorithm like mine (using some of my PAQ code) but mixing many
> hundreds of models instead of just 10.
> https://mattmahoney.net/dc/text.html
>
> -- Matt Mahoney, [email protected]
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> +
> delivery options <https://agi.topicbox.com/groups/agi/subscription>
> Permalink
> <https://agi.topicbox.com/groups/agi/T0518db1e3a0c25c5-Me78ccf134be65e0b2445bf3c>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T0518db1e3a0c25c5-Mcb0153bbcc4f6d6da2c28404
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] Preprocessor for Hutter prize

Reply via email to