I used the preprocessed enwik5 that you posted to the forum. It looks like
you are using PPM, which was once at the top of the benchmarks until PAQ
beat it. The problem with PPM is that it can only model contiguous
contexts. You can't model semantics like rain...wet with arbitrarily text
in between.

I believe some of the top PPM compressors like ppmonstr or durilca use SSE
(secondary symbol estimation), also called APM (adaptive probability map)
in PAQ. Just before the final bit prediction before arithmetic coding, the
probability and a small context like the last 1 or 2 bytes is mapped to a
2-D table to give a new prediction. Then the table is updated to reduce the
prediction error.

In PAQ and ZPAQ, the input prediction is stretched and quantized to 32
levels and interpolated between the two closest entries, and only the
closer one is updated. Often, this output is mixed with the input by simple
weighted averaging, like 3/4 output and 1/4 input. ZPAQ allows SSE (and
other) components anywhere in the context mixing graph, so you could chain
them together in series or mix them in parallel with different contexts.
They aren't normally used for large contexts because others like CM, ICM,
ISSE, and MATCH are more memory efficient and less prone to over fitting.

Also I am not convinced that transformers are the best architecture for
online learning. LLMs are trained offline and then the weights are frozen
to prevent information leaking between users. This requires very large
context windows, which would be unnecessary if the weights were updated by
user input like with normal compression. In that case it would be simpler
to use a short term memory of several low frequency tokens for context,
similar to how attention works in our own brains.

-- Matt Mahoney, [email protected]

On Sun, Dec 7, 2025, 6:09 PM <[email protected]> wrote:

> (Also, I want to say again it appears I have uncovered the answers and
> want everyone to take a deeper look at my forum thread's latest posts and
> my 2 replies in this thread too. I explain how to make my algorithm go to
> full potential and that is what self attention is doing building a
> on-the-fly understanding (narrowing down what memories to mix)).
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> +
> delivery options <https://agi.topicbox.com/groups/agi/subscription>
> Permalink
> <https://agi.topicbox.com/groups/agi/Tf0bedfcd44454678-M33e22b1776e744c57e5de690>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tf0bedfcd44454678-M74853bbf57165be9966f5007
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to