The preprocessed enwik5 I posted is a year or two outdated, did you use 2024's (from Byron's page)? I posted it on the forum thread recently and it helped my score by about 100 bytes. Is 20,201 then with that or if not, what is it with it?
I had looked again at Transformers about 3 weeks ago, it seems the self attention adds up scores so that basically each input word has an amount of how much it will push into the next layer, so it's basically just a tree then doing conditionals a>b it seems. It also clarifies bank is river bank or financial bank during self attention, this also aligns with my recent few posts/AI on my project thread of the forum. As well as related word embeds (which can be done without embeds too, I made that 4 years ago), and it also clearly is doing the priming (cat cat cat predicts cat next) and handles delays in where the words can be. The main difference is Transformers are the best currently but also are slow, large and complex code, and might have some disabilities such as online learning. The goal is to do the same thing but small simple code and fast code, and also allows more capabilities to be opened up this way. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tf0bedfcd44454678-M747199072c6871c71643368d Delivery options: https://agi.topicbox.com/groups/agi/subscription
