Paq mixes the probabilities from the context models by stretching: x = ln(p/(1-p)), then weighted summation using a neural network, then squashing the output with the inverse function p = 1/(1+e^-x). The weights are updated as w += Lx(b-p) where b is the actual bit, b-p is the output error, x is the stretched input, and L is the learning rate, about 0.001. For good compression I use a 16 bit integer for p, 12 bits for x in range -8 to 8, and 20 bits for w in range -8 to 8.
On Thu, Jan 30, 2020, 1:27 AM <[email protected]> wrote: > an exponential curve to do with text generation models, is history > attention being log proportional to how many patterns you need. so you > need more than than 10% compression, you need an impossible amount. > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + delivery > options <https://agi.topicbox.com/groups/agi/subscription> Permalink > <https://agi.topicbox.com/groups/agi/T409fc28ec41e6e3a-M38243400e7ffe1aab953975c> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T409fc28ec41e6e3a-Mebce312d627d11cec4306b1c Delivery options: https://agi.topicbox.com/groups/agi/subscription
