Agree. Neural networks model associative memory, which is why they work. A
language model maps several tokens of short term memory to a vector of next
token probabilities. A process which inputs text into short term memory,
retaining low frequency tokens longer, and outputs from this distribution
after training on a lifetime's worth of human writing, passes the Turing
test and is indistinguishable from conscious.

We can measure text prediction accuracy by replacing the output with an
arithmetic coder and measuring the compressed size. If the process is
deterministic, as in a computer but not the human brain, then we can verify
the algorithm by decompressing and comparing with the input used to train
it.

The next step in my Hutter prize entry is to write some English specific
rules for parsing morphology into base words and suffixes like -S, -ED,
-ING, build a dictionary with 30K to 50K vocabulary of common tokens
encoded using 1 or 2 bytes, and feed it into a context model with token
aligned contiguous and sparse contexts that model short term memory. My
earlier experiments using byte pair encoding already discovered the common
suffixes and I think would apply to other languages.

-- Matt Mahoney, [email protected]

On Sat, Apr 4, 2026, 10:54 PM James Bowery <[email protected]> wrote:

> https://elanbarenholtz.github.io/#evidence
>
> This guy seems to be on the right track.
>
> The Morphosyntax Experiment
>
> If syntax is distributional structure over high-leverage tokens, then
> function words (THE, WAS, TO) and morphology (-ING, -ED, -LY) should
> constrain predictions even when surrounded by nonsense. We tested this by
> measuring next-token entropy in language models across four conditions:
> Real Sentences:
> "The teacher was explaining the concept clearly"
> Jabberwocky:
> "The blicket was florping the daxen grentily" (function words + morphology
> intact)
> Stripped:
> "Ke blicket nar florp ke daxen grenti" (all nonwords, no morphology)
> Random Nonwords:
> Completely unstructured
>
> *Results:*
>
> Sentences (7.45 bits) < Jabberwocky (8.04) < Stripped (9.07) < Random
> (9.27)
>
> Morphosyntax alone reduces entropy by ~1 bit (p < 0.0001, d = -1.75).
> Function words and morphological markers constrain prediction independently
> of semantic content — exactly as predicted by the distributional account.
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> +
> delivery options <https://agi.topicbox.com/groups/agi/subscription>
> Permalink
> <https://agi.topicbox.com/groups/agi/T33aa28f274d02422-Ma259f1137fb92f305bbbb1c7>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T33aa28f274d02422-M57e9cc10a76d8f35dca6e8a1
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to