Paq mixes the probabilities from the context models by stretching: x =
ln(p/(1-p)), then weighted summation using a neural network, then squashing
the output with the inverse function p = 1/(1+e^-x). The weights are
updated as w += Lx(b-p) where b is the actual bit, b-p is the output error,
x is the stretched input, and L is the learning rate, about 0.001. For good
compression I use a 16 bit integer for p, 12 bits for x in range -8 to 8,
and 20 bits for w in range -8 to 8.

On Thu, Jan 30, 2020, 1:27 AM <[email protected]> wrote:

> an exponential curve to do with text generation models,  is history
> attention being log proportional to how many patterns you need.   so you
> need more than than 10% compression,  you need an impossible amount.
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> + delivery
> options <https://agi.topicbox.com/groups/agi/subscription> Permalink
> <https://agi.topicbox.com/groups/agi/T409fc28ec41e6e3a-M38243400e7ffe1aab953975c>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T409fc28ec41e6e3a-Mebce312d627d11cec4306b1c
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to