I've been playing with the notion that syntax and semantics are simply
opposing directions in a Grangeresque high order push down automata
grammar. HOPDAs have the virtue that their quasi-context sensitivity lends
them to what I like to call UTM fictions that we all indulge as we program
computers.

On Fri, Jan 23, 2026 at 12:36 AM <[email protected]> wrote:

>
> https://agi.topicbox.com/groups/agi/T0518db1e3a0c25c5/preprocessor-for-hutter-prize
> Good messages here. I just saw them now. Still need to read them.
> Yes (lol) water water is rare, I also have mine lowering there right after
> it sees a word to prime it stronger.
>
> Matt here is how my semantic model works below. Is mine better or worse
> than yours?
>
> My current implementation just makes a 2 byte-depth tree. The 1st breadth
> (the first root bytes) are the word on the right side of words (ex. "dog
> sleeps") (that means I took every 2 words in the dataset and switched their
> order just so i could store them like this: sleeps dog). This makes
> building the word relations faster because the root starts with the shared
> proofs first. Now: after the trees is built, I have a list that has 256
> lists, each of these lists has 256 zeros. This is the relation score
> 0.0-1.0 for every byte to every byte. Now, I check every root byte: Say we
> have "sleeps": if we have dog, cat, horse, human after this byte in the
> mini tree, i go to other thing we made also - the 256 lists we made, and i
> go to therefore dog, cat, horse, and human since they are what we have at
> hand after this root byte. I give each to each a score. Now, what is that
> score is calculated this way: How many counts does dog have? 550? How many
> counts does cat have? 1000? If dog is lower, I normalize it. In this
> example dog is about 2x more rare in the dataset. So what I do is I check
> also: how many do they share here in the mini tree? We have in the mini
> tree "sleeps dog" x2 and "sleeps cat" x10. So what I do is, if dog is lower
> in counts total, I up the "sleeps dog" artificially from 2 to 2*2=4. So now
> we have 4. Dog and cat share 4. So then I store this in dog's list, and
> that is stored like this: 4 / total counts. Because if dog and cat share 4
> but dog has 1 billion counts in the dataset (so does cat after
> normalization), then while they share 4, they really share almost nothing,
> because if they shared 100% then they would share 1 billion as well. So it
> is stored as a PART of the final amount. We come back to dog's list in the
> NEXT proof. If dog and cat share more proofs, we will add more score on top
> the score we just saved. We only did 1 part so far I'm saying. Lastly, I
> also give proofs downranking if they are too common, this improved the
> score. Rare words help prove 2 words are related, while common words have
> much less effect.
>
> I just realized something (?): I read a lot about Transformers, but no one
> seems to have explained something I just understood tonight: The token
> embedding step (after Byte Pair Encoding (Token IDs)) is NOT only related
> word dimensional vectors, the dimensional vectors also store the word's
> syntax. So "cat" can be in a "space" near animal (semantic) related words,
> but this vector can also have "info" and have simultaneously the word "cat"
> near other words that are syntax words (not related words). So then, as
> these embeds flow up the Transformer, it is adding info, making the last
> vector of the User's Prompt (apparently only that vector is used to get the
> next words after all computation is done) more and more specialized and
> clearer, until at the last place it is unembed and this vector is therefore
> used to check against every word vector in a vocab, and so if it was had
> 100 animal names to the left of it in the prompt, it will predict those,
> not a next word, but a related word, while if it didn't have animal words
> much next to it and more syntax sentence flow words, it will predict such
> one of those type, correspondingly of course based on the sentence. So,
> it's not doing priming "and" syntax conditionalism "and" related words
> "mechanisms", it's just doing dimensional vectors that have both types
> already
>
> hmm let's compare:
>
> my way:
> 1. build syntax tree
> 2. build semantic tree
> 1. translate last word(s) to get next words (already includes without
> translation, for to search the tree)
> 2. translate last words to vote on next words
>
> their way:
> 1. build dimensional syntax vocab vectors
> 2. also is building into those dimensional vectors the semantic, using
> separate method still
> 1.&2.&nope???: but then it does seem to do a few "more things here, it is
> using self attention (each word looks at each word) which allows it to make
> the last token word's vector "become" either a semantic thing (if all the
> words are just related words, this makes sense and is ok) or a syntactic
> word (if not related words are to the left of it), yes it's doing priming
> and next word prediction in 1 go but then they are also after this self
> attention sending the vectors up into a FFN and stuff like that which is
> another step. Idk why now, exactly.
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> +
> delivery options <https://agi.topicbox.com/groups/agi/subscription>
> Permalink
> <https://agi.topicbox.com/groups/agi/T2d9ee7e1ee2cd20c-Mbc6d3e00e545d91e44c7e253>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T2d9ee7e1ee2cd20c-M81c5b458835058372484bf81
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to