Agree. Neural networks model associative memory, which is why they work. A language model maps several tokens of short term memory to a vector of next token probabilities. A process which inputs text into short term memory, retaining low frequency tokens longer, and outputs from this distribution after training on a lifetime's worth of human writing, passes the Turing test and is indistinguishable from conscious.
We can measure text prediction accuracy by replacing the output with an arithmetic coder and measuring the compressed size. If the process is deterministic, as in a computer but not the human brain, then we can verify the algorithm by decompressing and comparing with the input used to train it. The next step in my Hutter prize entry is to write some English specific rules for parsing morphology into base words and suffixes like -S, -ED, -ING, build a dictionary with 30K to 50K vocabulary of common tokens encoded using 1 or 2 bytes, and feed it into a context model with token aligned contiguous and sparse contexts that model short term memory. My earlier experiments using byte pair encoding already discovered the common suffixes and I think would apply to other languages. -- Matt Mahoney, [email protected] On Sat, Apr 4, 2026, 10:54 PM James Bowery <[email protected]> wrote: > https://elanbarenholtz.github.io/#evidence > > This guy seems to be on the right track. > > The Morphosyntax Experiment > > If syntax is distributional structure over high-leverage tokens, then > function words (THE, WAS, TO) and morphology (-ING, -ED, -LY) should > constrain predictions even when surrounded by nonsense. We tested this by > measuring next-token entropy in language models across four conditions: > Real Sentences: > "The teacher was explaining the concept clearly" > Jabberwocky: > "The blicket was florping the daxen grentily" (function words + morphology > intact) > Stripped: > "Ke blicket nar florp ke daxen grenti" (all nonwords, no morphology) > Random Nonwords: > Completely unstructured > > *Results:* > > Sentences (7.45 bits) < Jabberwocky (8.04) < Stripped (9.07) < Random > (9.27) > > Morphosyntax alone reduces entropy by ~1 bit (p < 0.0001, d = -1.75). > Function words and morphological markers constrain prediction independently > of semantic content — exactly as predicted by the distributional account. > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/T33aa28f274d02422-Ma259f1137fb92f305bbbb1c7> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T33aa28f274d02422-M57e9cc10a76d8f35dca6e8a1 Delivery options: https://agi.topicbox.com/groups/agi/subscription
