https://agi.topicbox.com/groups/agi/T0518db1e3a0c25c5/preprocessor-for-hutter-prize Good messages here. I just saw them now. Still need to read them. Yes (lol) water water is rare, I also have mine lowering there right after it sees a word to prime it stronger.
Matt here is how my semantic model works below. Is mine better or worse than yours? My current implementation just makes a 2 byte-depth tree. The 1st breadth (the first root bytes) are the word on the right side of words (ex. "dog sleeps") (that means I took every 2 words in the dataset and switched their order just so i could store them like this: sleeps dog). This makes building the word relations faster because the root starts with the shared proofs first. Now: after the trees is built, I have a list that has 256 lists, each of these lists has 256 zeros. This is the relation score 0.0-1.0 for every byte to every byte. Now, I check every root byte: Say we have "sleeps": if we have dog, cat, horse, human after this byte in the mini tree, i go to other thing we made also - the 256 lists we made, and i go to therefore dog, cat, horse, and human since they are what we have at hand after this root byte. I give each to each a score. Now, what is that score is calculated this way: How many counts does dog have? 550? How many counts does cat have? 1000? If dog is lower, I normalize it. In this example dog is about 2x more rare in the dataset. So what I do is I check also: how many do they share here in the mini tree? We have in the mini tree "sleeps dog" x2 and "sleeps cat" x10. So what I do is, if dog is lower in counts total, I up the "sleeps dog" artificially from 2 to 2*2=4. So now we have 4. Dog and cat share 4. So then I store this in dog's list, and that is stored like this: 4 / total counts. Because if dog and cat share 4 but dog has 1 billion counts in the dataset (so does cat after normalization), then while they share 4, they really share almost nothing, because if they shared 100% then they would share 1 billion as well. So it is stored as a PART of the final amount. We come back to dog's list in the NEXT proof. If dog and cat share more proofs, we will add more score on top the score we just saved. We only did 1 part so far I'm saying. Lastly, I also give proofs downranking if they are too common, this improved the score. Rare words help prove 2 words are related, while common words have much less effect. I just realized something (?): I read a lot about Transformers, but no one seems to have explained something I just understood tonight: The token embedding step (after Byte Pair Encoding (Token IDs)) is NOT only related word dimensional vectors, the dimensional vectors also store the word's syntax. So "cat" can be in a "space" near animal (semantic) related words, but this vector can also have "info" and have simultaneously the word "cat" near other words that are syntax words (not related words). So then, as these embeds flow up the Transformer, it is adding info, making the last vector of the User's Prompt (apparently only that vector is used to get the next words after all computation is done) more and more specialized and clearer, until at the last place it is unembed and this vector is therefore used to check against every word vector in a vocab, and so if it was had 100 animal names to the left of it in the prompt, it will predict those, not a next word, but a related word, while if it didn't have animal words much next to it and more syntax sentence flow words, it will predict such one of those type, correspondingly of course based on the sentence. So, it's not doing priming "and" syntax conditionalism "and" related words "mechanisms", it's just doing dimensional vectors that have both types already hmm let's compare: my way: 1. build syntax tree 2. build semantic tree 1. translate last word(s) to get next words (already includes without translation, for to search the tree) 2. translate last words to vote on next words their way: 1. build dimensional syntax vocab vectors 2. also is building into those dimensional vectors the semantic, using separate method still 1.&2.&nope???: but then it does seem to do a few "more things here, it is using self attention (each word looks at each word) which allows it to make the last token word's vector "become" either a semantic thing (if all the words are just related words, this makes sense and is ok) or a syntactic word (if not related words are to the left of it), yes it's doing priming and next word prediction in 1 go but then they are also after this self attention sending the vectors up into a FFN and stuff like that which is another step. Idk why now, exactly. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T2d9ee7e1ee2cd20c-Mbc6d3e00e545d91e44c7e253 Delivery options: https://agi.topicbox.com/groups/agi/subscription
