On Tue, Mar 31, 2020 at 7:57 PM <immortal.discover...@gmail.com> wrote:
>
> New: Same 10,000,000 bytes losslessly compressed to 2,305,386 bytes. Same 
> code.

Some results with zpaq -m1 ... -m5 on enwik7 (first 10 MB of enwik8 or
enwik9). Compression and decompression times are in seconds on a 2.53
GHz Core i5 M540, 4 GB, Windows 7.

-m1 3717520 1.03 0.25
-m2 3276869 4.63 0.29
-m3 2375046 4.72 4.71
-m4 2188214 14.21 14.91
-m5 2091360 45.47 46.52

Methods m1 and m2 use LZ77. m2 spends more time to find better matches
(a suffix array to find the longest match instead of a hash table) and
1 byte lookahead. m3 uses BWT followed by an order 0-1 ICM-ISSE chain
for modeling. m4 and m5 use context modeling. Both have ICM-ISSE
chains and whole word contexts, but 5 has additional models.

BWT is a Burrows Wheeler transform, in which the bytes are sorted by
context to bring similar contexts together. An ICM-ISSE chain starts
with an an order 0 ICM (indirect context model), followed by ISSE
predictors with increasingly long contexts. An ICM maps a context to a
bit history, which is mapped to a prediction. An ISSE is a 2 input
mixer (neural network) with one input from the previous component
stretched prediction and the other fixed at 1. The bit history is used
to select a pair of weights.

The idea of a bit history is to save the last several bits in a
context. If you see a sequence like 0000000001, what is the
probability that the next bit is 1? If the data is stationary, then it
is 1/10. But a highly adaptive model might give a higher probability.
An indirect model solves this problem by saving statistics from other
contexts that observed the same sequence.

This paper describes the compression algorithm in more detail.
http://mattmahoney.net/dc/zpaq_compression.pdf

-- 
-- Matt Mahoney, mattmahone...@gmail.com

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tcfc4df5e57c62b43-Mb6d2d6f575195ed8b63433f5
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to