Re: [agi] My new tiny version of my algorithm

Matt Mahoney Thu, 11 Dec 2025 16:46:17 -0800

On Wed, Dec 10, 2025 at 4:59 AM <[email protected]> wrote:
>
> Oh. My algorithm does something similar to SSE. But I think it's a bit 
> different. The priming I use considers how far back each character is in the 
> last ~500 characters, and also use what you called MATCH but that's for 
> longer than order 0 (and I do that for the first few orders too).
>
> I have at least 1 more question for you though. How does you algorithm (if it 
> does?) handle time delay matching? Here's an example of the problem: You've 
> seen in the past "the large large large um um um cat chewed" and you see in 
> the future "the cat ?_?" and can in theory use the old memory's prediction - 
> but you need to match that old memory - how does your algorithm handle this 
> or does it not use these?


zpaq has some advanced options for contexts with fixed length gaps. It
also has models that map word sequences to lower case and skip spaces
and punctuation. Beyond that would require writing code in ZPAQL in a
config file and compressing with zpaqd, but I haven't done this. Here
is a simple example using the advanced options described in
https://mattmahoney.net/dc/zpaqdoc.html

C:\tmp>zpaq a "" "2024pre-processed enwik5.txt" -ms4.0ci1.1.1.1am
zpaq v7.15 journaling archiver, compiled Aug 17 2016
Creating  at offset 0 + 0
Adding 0.054804 MB in 1 files -method s4.0ci1.1.1.1am -threads 2 at
2025-12-12 00:04:19.
+ 2024pre-processed enwik5.txt 54804
0 + (54804 -> 22917) = 22917
0.046 seconds (all OK)

On the first line, the archive name is "" which means to just report
the compressed size and discard it. The option -ms4.0ci1.1.1.1am means
as follows:

-m selects the compression method.

s selects streaming mode, which means simple compression with no
dedupe or rollback.

4 selects a block size of 2^4 MB. This also controls how much memory
is used by the other components.

.0 means no preprocessing. You can also select LZ77, BWT, and E8E9 but
these make compression worse in this case.

c selects an indirect context model with an order 0 context. It maps
the previous bits of the current byte to an 8 bit state representing
counts of 0 and 1 bits and the last bit. That state is then mapped to
probability by a second table. After coding, both tables are updated
to the new state and a new probability that reduces the prediction
error by 0.1%.

i1.1.1.1 selects 4 ISSE components with each one being 1 order higher
than the previous, i.e. orders 1, 2, 3, and 4. These are hashes that
include the previous bits of the current byte as well. An ISSE takes
the output of the previous component and mixes it with a constant 1,
where the two mixer weights are selected by the state as in an ICM.
Mixing is by weighted summation in the logistic domain, i.e stretch(p)
= log(p/(1-p)), which can be negative. Then the state is updated and
the two weights are updated to reduce the prediction error by 0.1%.

a selects a match model. It uses a hash table to find the last
occurrence of an order 8 context match and predicts whatever bit came
next, weighted by the actual length of the match.

m selects a mixer that combines the predictions of all of the previous
components using a simple neural network with no hidden layer. The
mixer weights are selected by the order 0 context, i.e. the previous
bits of the current byte. The weighted averaging is in the logistic
domain as before. The output is converted back to a prediction by the
squash(x) = 1/(1+e^-x), the inverse of stretch.

The last 2 lines show that the file was compressed from 54804 bytes to
22917 in 46 milliseconds.

You can get better compression by adding an SSE to the final output.

-ms4.0ci1.1.1.1ams  -> 22895 (47 ms)
-ms4.0ci1.1.1.1amst -> 22874 (62 ms)

s is the SSE (or APM in PAQ). It selects a new prediction from a 2-D
table indexed by the order 0 context and the old prediction stretched
and quantized to 32 levels and interpolated. It is updated by reducing
the prediction error of the closest entry by 0.1%.

t is a 2 input mixer for the last 2 components, which are the SSE
input and output.

This is just a simple context model with no gaps and no word modeling.
I didn't try word modeling because the input was already dictionary
preprocessed. But it does help on the original text:

zpaq a "" "enwik5" -ms4.0ci1.1.1.1amst -> 28216 (94 ms)
zpaq a "" "enwik5" -ms4.0ci1.1.1.1awmst -> 27316 (94 ms)
zpaq a "" "enwik5" -ms4.0ci1.1.1.1aw2mst -> 27288 (109 ms)

w selects an ICM where the context is the current word (A-Z, case
insensitive). w2 is an order 0-1 ICM-ISSE chain, but where the
contexts are words instead of bytes. (w3 and higher make compression
worse).

You can find a more detailed description of the compression algorithm
on the main page. https://mattmahoney.net/dc/zpaq.html

-- 
-- Matt Mahoney, [email protected]

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tf0bedfcd44454678-Mcb0b16824e7f471babbee651
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] My new tiny version of my algorithm

Reply via email to