I used the preprocessed enwik5 that you posted to the forum. It looks like you are using PPM, which was once at the top of the benchmarks until PAQ beat it. The problem with PPM is that it can only model contiguous contexts. You can't model semantics like rain...wet with arbitrarily text in between.
I believe some of the top PPM compressors like ppmonstr or durilca use SSE (secondary symbol estimation), also called APM (adaptive probability map) in PAQ. Just before the final bit prediction before arithmetic coding, the probability and a small context like the last 1 or 2 bytes is mapped to a 2-D table to give a new prediction. Then the table is updated to reduce the prediction error. In PAQ and ZPAQ, the input prediction is stretched and quantized to 32 levels and interpolated between the two closest entries, and only the closer one is updated. Often, this output is mixed with the input by simple weighted averaging, like 3/4 output and 1/4 input. ZPAQ allows SSE (and other) components anywhere in the context mixing graph, so you could chain them together in series or mix them in parallel with different contexts. They aren't normally used for large contexts because others like CM, ICM, ISSE, and MATCH are more memory efficient and less prone to over fitting. Also I am not convinced that transformers are the best architecture for online learning. LLMs are trained offline and then the weights are frozen to prevent information leaking between users. This requires very large context windows, which would be unnecessary if the weights were updated by user input like with normal compression. In that case it would be simpler to use a short term memory of several low frequency tokens for context, similar to how attention works in our own brains. -- Matt Mahoney, [email protected] On Sun, Dec 7, 2025, 6:09 PM <[email protected]> wrote: > (Also, I want to say again it appears I have uncovered the answers and > want everyone to take a deeper look at my forum thread's latest posts and > my 2 replies in this thread too. I explain how to make my algorithm go to > full potential and that is what self attention is doing building a > on-the-fly understanding (narrowing down what memories to mix)). > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/Tf0bedfcd44454678-M33e22b1776e744c57e5de690> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tf0bedfcd44454678-M74853bbf57165be9966f5007 Delivery options: https://agi.topicbox.com/groups/agi/subscription
