On Fri, Jan 31, 2020, 3:49 AM <[email protected]> wrote: > that sounds like the cool way to do it :), i do it the easy way and just > use a binary key store... and i get my compression by sharing sections of > the keys. >
Any benchmark results? I would be interested if it improves compression. Compression is a highly experimental process. Most of the stuff I tried either didn't work or resulted in tiny improvements. In earlier versions (paq6) I mixed contexts by counting 0s and 1s and mixed them by weighted additions of the counts. Even this was good enough to win the Calgary compression challenge, beating out PPM as the best known algorithm. Logistic mixing turned out to work even better and is simpler. The update rule is gradient descent in weight space and is simpler than back propagation because it minimizes coding cost instead of root mean square error. Instead of w += Lx(b-p)(p)(1-p) (where p is the output probability, b is the predicted bit, x is the input, and L is the learning rate), when you take the partial derivative of the entropy, the terms (p)(1-p) go away. But there is a lot more to text compression than that. You have whole word contexts and sparse contexts that skip bits, characters, or words. You have match models that search for long matching contexts and predict whatever followed with a weight proportional to the match length. You can model the same contexts in different ways. For example, if you observe a sequence like 0000000001 in some context, what is the next bit? Fast and slow adapting context models will give different predictions, which you can mix. Paq makes extensive use of indirect context models where the sequence is mapped to a table of 0 and 1 counts to get the answer from the actual data. You can also play with the mixers. You can use a small (8-16 bit) context to select the mixer weights and build a tree of mixers with different contexts and learning rates with the context models at the leaves and final prediction at the root. Paq also tunes the prediction using SSE (secondary symbol estimators), which is a table that maps a small context and a prediction to a new prediction, and mixes them too. The best compressors preprocess the input by using special symbols to indicate upper case letters and a dictionary that maps common words to symbols. The dictionary is organized to group semantically and syntactically related words so that bitwise sparse models can recognize the groups. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T409fc28ec41e6e3a-Me2c30d70e9480036dabaf6a6 Delivery options: https://agi.topicbox.com/groups/agi/subscription
