On Sat, Aug 30, 2025 at 3:24 AM Matt Mahoney <[email protected]>
wrote:

> On Thu, Aug 28, 2025, 4:28 AM Rob Freeman <[email protected]>
> wrote:
>
> But where are you getting the idea that LLMs correlate closely with spike
>> rate based neural models Matt? You make it sound like it is settled
>> neuroscience.
>>
>
> What do you think the activation level in an artificial neuron represents?
>

How are LLM weights represented in the neural context? Personally I think
the corresponding qualities in the neural context are probably neural
groupings.

To make an analogy, if you take a population of people you might represent
the distribution of their ethnicity by assigning "weights" to a
representation of "ethnic" symbols. Then you replace the population by
ethnic symbols, with different weights. But the choice of symbols is always
going to be subjective. It's much more powerful to keep the population, and
group it to generate different "weights"/symbols, at need.

The only question is how you do the grouping.

And the answer to that for language models can be the same as the way it's
done for LLMs now: you group (not generate weights) according to how
elements share contexts. That's the mechanism which has worked for LLMs.
And it's been a big revolution in AI. (Why? I think exactly because it
allows representation to change and form different groupings.)

The difference with the "grouping" model, is that you don't fix the
"weights" rigidly at a single time of training. If you think of weights
instead as groupings, you can make new groupings all the time at need.

So, I don't think you need a spike rate equated to weights model. You can
have a "grouping" equated to weights model. And a grouping to weight model
corresponds better to a phase based code (group neurons according to how
their phases synchronize, or order in some other physical way dependent on
context.)

So I think weights in the neural implementation will correspond to
different groupings of neurons. Actually grouped by shared contexts.
Physically grouped by their shared contexts. Probably by some resonance
mechanism.

Degenerating in the aggregate to a summarization as weights as in LLMs,
which group... "learn" weights, based on shared context. But not relying on
a single "learning" phase (using backprop.)


> In the top text compressors that I am familiar with, neurons represent
> features, which in LLMs can be letters, tokens, or grammatical or semantic
> categories. Activation levels are calculated as a weighted sum of inputs
> and then clamped. The network is trained by adjusting the weights to reduce
> prediction errors.
>

The "compression" model, the basis of the Hutter Prize, was always flawed.
Actually groupings can _expand_ a representation. Find new orders all the
time. That's what we miss.

The contribution of the Hutter Prize to our knowledge has been barren. It
didn't find the semantic primitives Hutter envisioned for it.

You can take any one of those orders found by one or other (partial)
compression, and call it a "representation" of the system. And as one of
the (infinite) possible orders, it does represent information. The fact of
ordering has been a partial extraction of information. So you get (partial)
success by doing that. But you've missed the point that the orders can be
changed, to create new orders. So you get some value, but miss the essence
of the system. Which is its creativity.

No accident that creativity is the bit that current LLMs do so poorly.
Limited to random "hallucinations" or noise within rigid learned orders.


> It is essentially the model described by Rumelhart and McClellan in the
> 1980s.
>

The insight of distributed representation was the first step, yes. R&M
summarized that wonderfully in the '80s. And then it finally reached full
expression when society invested enough (not on research, invested enough
on hardware to play games) for hardware to catch up 2010.

But Rumelhart and McClellan 20 years earlier, yes, wonderful. The insight
of distributed representation. But what was the distributed representation
insight at base? I'm not sure anyone thinks about this. Many people
attribute the success of NNs to "learning". Even today we talk more about
"machine learning" than distributed representation. The value most people
seem to attribute to the NN revolution, seems to be the fact you can point
them at a problem and have them produce (some kind of) answer, WITHOUT
thinking! No-one thinks about it, because people think the value of the
model is that you don't have to think about it! Not thinking is what's good
about it!

I say the true insight waiting for us to see it in distributed
representation, still waiting... was that patterns could vary. Conferring
robustness, graceful degradation, etc.

But in NNs circa R&M the variation was still constrained by fixed patterns.
Usually patterns fixed, or "supervised" externally in some way. The 2010
revolution started when NNs discovered "cats". But you still had to have
cats.

The next big jump came in 2017. What was it?

Lots more NOT THINKING applied to this. People don't think about why LLMs
work much at all. (Recent initiative from Simons Foundation
excepted? “Artificial intelligence tools have rapidly advanced over the
last few years and entered our day-to-day lives, yet we still fundamentally
don’t understand what they’re doing under the hood,” says Collaboration
Director Surya Ganguli,
https://www.simonsfoundation.org/2025/08/18/simons-foundation-launches-collaboration-on-the-physics-of-learning-and-neural-computation/)
They just go straight to whatever the latest API is at whatever big corp
has released new this week... But I say the "attention" mechanism stumbled
on the power of grouping by shared context to generate new patterns. New
patterns which were still meaningful (still meaningful, because "meaning"
was conferred by sharing prediction, not any arbitrary external category,
like "cats".)


> Modern LLMs and the top compressors like NNCP use transformers, which
> model lateral inhibition and feedback like real brains, rather than just a
> feed forward network.
>

They use "feedback like real brains"?

Again, you're presenting this like settled science. No-one knows how brains
generate structure. Just having feedback says nothing. A badly tuned stereo
amp has feedback.


> The evidence for biological plausibility is that all of this works, even
> down to making the sae kinds of mistakes as humans
>

Yes, distributed representation works. And compression gets you some
patterns. Just as cats in 2010 gave us some patterns.

But there are many things which are inconsistent with a compression model.
Not least that LLMs get enormously big! (Use enormous power, have enormous
single training phases, can't learn worth a damn, can't create worth a
damn, don't structure, depend on arbitrary assumptions of "token"...)

Anyway, what do I think corresponds to LLM "weight" in contrast to spike
rate? I think neural groupings, probably grouped by phase, or spike time,
not spike rate.

-R

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta9b77fda597cc07a-Mc260618fa32b7c227ee2812c
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to