On Fri, Jan 31, 2020, 5:05 PM <[email protected]> wrote:

> Gotta hold off on that BWT, it's losing patterns Matt!! I don't feel good
> about it. And the PPM & Cmixing is what I already do ... what are u
> sayinggg?? I mix together many partial matches.
>

You can read about BWT and PPM in my book to understand their limitations
relative to CM. BWT sorts the input bytes by context (it is not obvious but
there is a fast inverse algorithm). This is convenient for bringing
together contexts that make the same prediction, resulting in runs of
identical characters. But it loses other contexts. BWT does poorly on
sorted data, images, and concatenations of different types like tar files.

PPM predicts at the byte level instead of the bit level like CM. So it has
to estimate 256 probabilities. For each possibility, it finds the longest
context where it was seen at least once. For each context of length n, it
calculates the distribution of the characters seen and estimates the
remaining probability for unseen characters, which is divided using the
statistics from the order n-1 context. The main limitation is you are
restricted to contiguous contexts, which is OK for text but misses 2
dimensional contexts like in images and databases.

CM of course allows arbitrary contexts. Paq has code to detect different
file types and adds contexts useful for that type. It detects fixed record
lengths like in images and adds 2-D contexts like neighboring pixels in the
previous scan line. It detects JPEG and decodes the DCT coefficients and
models them to compress another 30%. It detects x86 code and models the
relative addresses in CALL and JMP instructions by converting to absolute
addresses.

All of this complexity is in keeping with Legg's proof that powerful
predictors are necessarily complex. You end up writing lots of code to
handle special cases and obscure file types to squeeze out just a little
more compression. You can see why it can take 10 years or more to develop a
good compressor.


------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T409fc28ec41e6e3a-M51bc548c0582ee481ed2106b
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to