The preprocessed enwik5 I posted is a year or two outdated, did you use 2024's 
(from Byron's page)? I posted it on the forum thread recently and it helped my 
score by about 100 bytes. Is 20,201 then with that or if not, what is it with 
it?

I had looked again at Transformers about 3 weeks ago, it seems the self 
attention adds up scores so that basically each input word has an amount of how 
much it will push into the next layer, so it's basically just a tree then doing 
conditionals a>b it seems.  It also clarifies bank is river bank or financial 
bank during self attention, this also aligns with my recent few posts/AI on my 
project thread of the forum. As well as related word embeds (which can be done 
without embeds too, I made that 4 years ago), and it also clearly is doing the 
priming (cat cat cat predicts cat next) and handles delays in where the words 
can be. The main difference is Transformers are the best currently but also are 
slow, large and complex code, and might have some disabilities such as online 
learning. The goal is to do the same thing but small simple code and fast code, 
and also allows more capabilities to be opened up this way.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tf0bedfcd44454678-M747199072c6871c71643368d
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to