[agi] semantic and syntax (comparing new and old ways)

immortal . discoveries Thu, 22 Jan 2026 22:36:28 -0800

https://agi.topicbox.com/groups/agi/T0518db1e3a0c25c5/preprocessor-for-hutter-prize
Good messages here. I just saw them now. Still need to read them.
Yes (lol) water water is rare, I also have mine lowering there right after it 
sees a word to prime it stronger.


Matt here is how my semantic model works below. Is mine better or worse than 
yours?

My current implementation just makes a 2 byte-depth tree. The 1st breadth (the 
first root bytes) are the word on the right side of words (ex. "dog sleeps") 
(that means I took every 2 words in the dataset and switched their order just 
so i could store them like this: sleeps dog). This makes building the word 
relations faster because the root starts with the shared proofs first. Now: 
after the trees is built, I have a list that has 256 lists, each of these lists 
has 256 zeros. This is the relation score 0.0-1.0 for every byte to every byte. 
Now, I check every root byte: Say we have "sleeps": if we have dog, cat, horse, 
human after this byte in the mini tree, i go to other thing we made also - the 
256 lists we made, and i go to therefore dog, cat, horse, and human since they 
are what we have at hand after this root byte. I give each to each a score. 
Now, what is that score is calculated this way: How many counts does dog have? 
550? How many counts does cat have? 1000? If dog is lower, I normalize it. In 
this example dog is about 2x more rare in the dataset. So what I do is I check 
also: how many do they share here in the mini tree? We have in the mini tree 
"sleeps dog" x2 and "sleeps cat" x10. So what I do is, if dog is lower in 
counts total, I up the "sleeps dog" artificially from 2 to 2*2=4. So now we 
have 4. Dog and cat share 4. So then I store this in dog's list, and that is 
stored like this: 4 / total counts. Because if dog and cat share 4 but dog has 
1 billion counts in the dataset (so does cat after normalization), then while 
they share 4, they really share almost nothing, because if they shared 100% 
then they would share 1 billion as well. So it is stored as a PART of the final 
amount. We come back to dog's list in the NEXT proof. If dog and cat share more 
proofs, we will add more score on top the score we just saved. We only did 1 
part so far I'm saying. Lastly, I also give proofs downranking if they are too 
common, this improved the score. Rare words help prove 2 words are related, 
while common words have much less effect.

I just realized something (?): I read a lot about Transformers, but no one 
seems to have explained something I just understood tonight: The token 
embedding step (after Byte Pair Encoding (Token IDs)) is NOT only related word 
dimensional vectors, the dimensional vectors also store the word's syntax. So 
"cat" can be in a "space" near animal (semantic) related words, but this vector 
can also have "info" and have simultaneously the word "cat" near other words 
that are syntax words (not related words). So then, as these embeds flow up the 
Transformer, it is adding info, making the last vector of the User's Prompt 
(apparently only that vector is used to get the next words after all 
computation is done) more and more specialized and clearer, until at the last 
place it is unembed and this vector is therefore used to check against every 
word vector in a vocab, and so if it was had 100 animal names to the left of it 
in the prompt, it will predict those, not a next word, but a related word, 
while if it didn't have animal words much next to it and more syntax sentence 
flow words, it will predict such one of those type, correspondingly of course 
based on the sentence. So, it's not doing priming "and" syntax conditionalism 
"and" related words "mechanisms", it's just doing dimensional vectors that have 
both types already

hmm let's compare:

my way:
1. build syntax tree
2. build semantic tree
1. translate last word(s) to get next words (already includes without 
translation, for to search the tree)
2. translate last words to vote on next words

their way:
1. build dimensional syntax vocab vectors
2. also is building into those dimensional vectors the semantic, using separate 
method still
1.&2.&nope???: but then it does seem to do a few "more things here, it is using 
self attention (each word looks at each word) which allows it to make the last 
token word's vector "become" either a semantic thing (if all the words are just 
related words, this makes sense and is ok) or a syntactic word (if not related 
words are to the left of it), yes it's doing priming and next word prediction 
in 1 go but then they are also after this self attention sending the vectors up 
into a FFN and stuff like that which is another step. Idk why now, exactly.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T2d9ee7e1ee2cd20c-Mbc6d3e00e545d91e44c7e253
Delivery options: https://agi.topicbox.com/groups/agi/subscription

[agi] semantic and syntax (comparing new and old ways)

Reply via email to