[agi] Why is AGI so hard?

Matt Mahoney Sat, 21 Mar 2026 10:28:12 -0700

Yes, I know I wrote a paper in 2013 estimating that automating the
economy with AGI would cost $1 quadrillion, mostly to collect 10^17
bits of human knowledge. This proved correct when it took companies
with trillion dollar market caps to produce LLMs. Those are the ones
that have access to your emails, texts, and social media posts that go
far beyond the 10^13 bits you can suck off the public internet. Even
so, we are less than 1% of the way there, which is why AI has not put
a dent in the employment rate yet.


But that's not what I'm trying to build. I'm building a human level
small language model (SLM). It's not a probabilistic logic knowledge
base like the ones that Ben Goertzel, YKY, and Pei Wang were
developing before they left the group when LLMs proved in 2023 that
all you need to pass the Turing test is text prediction, like I
predicted in a 1999 paper. That's basically the Hutter prize. I do
appreciate that 2 of the 3 Hutter prize committee members (me and
James Bowery) are still active here, and others (Immortal Discoveries,
or submerge on encode.su) are pursuing this approach as well.

My math mostly agrees with Turing's 1950 prediction that a computer
with 10^9 bits of memory, but no faster than current technology
(mechanical relays are as fast as neurons) would win the imitation
game (now known as the Turing test) by 2000. His forecast of Moore's
law was remarkably prescient, given that Gordon Moore didn't state it
until 1965. Turing's paper was published just after Shannon invented
information theory and estimated the entropy of English at about 1 bit
per character, consistent with the top results on my large text
benchmark. It also predates Landauer's 1973 estimate of 10^9 bits of
human long term memory capacity, although Turing could have easily
estimated how many words we process in a lifetime.

My math says a SLM can be implemented on a single CPU at 10,000 x real
time, compressing a lifetime of learning into a day. You have a
vocabulary of about 50K tokens with a Zipf distribution, where the
n'th most frequent word has a frequency of about 0.1/n. You have a
short term memory of about 7 tokens, where low frequency tokens
persist longer. You have a 50K by 50K matrix mapping short term memory
to the predicted token, with the sparse parts of the matrix
implemented as hidden layers in a neural network to cut the parameter
space to 10^9. Updates should be fast because the learning rate is
only about 4 bits per token, so only a small number of parameters need
to be updated. Predictions should likewise be fast if we implement an
attention mechanism in the hidden layer (like in transformers), where
all but the few most active neurons are set to 0.

But it is still hard. I suppose if it wasn't, we would have solved AI
23 years earlier. Two months ago I released a version that compressed
enwik9 to 145 MB in 10 minutes using article sorting by topic, XML
unwrapping, capitalization encoding, a tiny dictionary, and a pure
linear context model. The plan is to mix these predictions with the
language model, which I have yet to write. Instead I spent the last 2
months refining the context model. I had a bunch of ideas to
dramatically improve speed or memory usage, but ended up spending days
to implement and debug them, only to see it either didn't work or the
improvement was so marginal it wasn't worth the effort. As the program
grows, each update is like brain surgery, carefully changing 1 or 2
lines and testing in case I broke something and have to go back. In 2
months, all I have to show for it is 142 MB in 20 minutes, a tiny
movement along the Pareto frontier that isn't even worth releasing. I
need to get to 110 MB, but as I do, testing times will go from minutes
to hours to days.

There's something I'm not getting. Why does the brain need 10^15
synapses to store 10^9 bits? Maybe it's a speed optimization, like how
a server farm has a million copies of Linux, or your body has 10^13
copies of your DNA. Or is it something else? Is it the reason we
didn't solve AI in 2000?

-- 
-- Matt Mahoney, [email protected]

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tc9fe35df94409188-M580dec3e9f299a9aad2af686
Delivery options: https://agi.topicbox.com/groups/agi/subscription

[agi] Why is AGI so hard?

Reply via email to