On Sat, Aug 30, 2025 at 3:24 AM Matt Mahoney <[email protected]> wrote:
> On Thu, Aug 28, 2025, 4:28 AM Rob Freeman <[email protected]> > wrote: > > But where are you getting the idea that LLMs correlate closely with spike >> rate based neural models Matt? You make it sound like it is settled >> neuroscience. >> > > What do you think the activation level in an artificial neuron represents? > How are LLM weights represented in the neural context? Personally I think the corresponding qualities in the neural context are probably neural groupings. To make an analogy, if you take a population of people you might represent the distribution of their ethnicity by assigning "weights" to a representation of "ethnic" symbols. Then you replace the population by ethnic symbols, with different weights. But the choice of symbols is always going to be subjective. It's much more powerful to keep the population, and group it to generate different "weights"/symbols, at need. The only question is how you do the grouping. And the answer to that for language models can be the same as the way it's done for LLMs now: you group (not generate weights) according to how elements share contexts. That's the mechanism which has worked for LLMs. And it's been a big revolution in AI. (Why? I think exactly because it allows representation to change and form different groupings.) The difference with the "grouping" model, is that you don't fix the "weights" rigidly at a single time of training. If you think of weights instead as groupings, you can make new groupings all the time at need. So, I don't think you need a spike rate equated to weights model. You can have a "grouping" equated to weights model. And a grouping to weight model corresponds better to a phase based code (group neurons according to how their phases synchronize, or order in some other physical way dependent on context.) So I think weights in the neural implementation will correspond to different groupings of neurons. Actually grouped by shared contexts. Physically grouped by their shared contexts. Probably by some resonance mechanism. Degenerating in the aggregate to a summarization as weights as in LLMs, which group... "learn" weights, based on shared context. But not relying on a single "learning" phase (using backprop.) > In the top text compressors that I am familiar with, neurons represent > features, which in LLMs can be letters, tokens, or grammatical or semantic > categories. Activation levels are calculated as a weighted sum of inputs > and then clamped. The network is trained by adjusting the weights to reduce > prediction errors. > The "compression" model, the basis of the Hutter Prize, was always flawed. Actually groupings can _expand_ a representation. Find new orders all the time. That's what we miss. The contribution of the Hutter Prize to our knowledge has been barren. It didn't find the semantic primitives Hutter envisioned for it. You can take any one of those orders found by one or other (partial) compression, and call it a "representation" of the system. And as one of the (infinite) possible orders, it does represent information. The fact of ordering has been a partial extraction of information. So you get (partial) success by doing that. But you've missed the point that the orders can be changed, to create new orders. So you get some value, but miss the essence of the system. Which is its creativity. No accident that creativity is the bit that current LLMs do so poorly. Limited to random "hallucinations" or noise within rigid learned orders. > It is essentially the model described by Rumelhart and McClellan in the > 1980s. > The insight of distributed representation was the first step, yes. R&M summarized that wonderfully in the '80s. And then it finally reached full expression when society invested enough (not on research, invested enough on hardware to play games) for hardware to catch up 2010. But Rumelhart and McClellan 20 years earlier, yes, wonderful. The insight of distributed representation. But what was the distributed representation insight at base? I'm not sure anyone thinks about this. Many people attribute the success of NNs to "learning". Even today we talk more about "machine learning" than distributed representation. The value most people seem to attribute to the NN revolution, seems to be the fact you can point them at a problem and have them produce (some kind of) answer, WITHOUT thinking! No-one thinks about it, because people think the value of the model is that you don't have to think about it! Not thinking is what's good about it! I say the true insight waiting for us to see it in distributed representation, still waiting... was that patterns could vary. Conferring robustness, graceful degradation, etc. But in NNs circa R&M the variation was still constrained by fixed patterns. Usually patterns fixed, or "supervised" externally in some way. The 2010 revolution started when NNs discovered "cats". But you still had to have cats. The next big jump came in 2017. What was it? Lots more NOT THINKING applied to this. People don't think about why LLMs work much at all. (Recent initiative from Simons Foundation excepted? “Artificial intelligence tools have rapidly advanced over the last few years and entered our day-to-day lives, yet we still fundamentally don’t understand what they’re doing under the hood,” says Collaboration Director Surya Ganguli, https://www.simonsfoundation.org/2025/08/18/simons-foundation-launches-collaboration-on-the-physics-of-learning-and-neural-computation/) They just go straight to whatever the latest API is at whatever big corp has released new this week... But I say the "attention" mechanism stumbled on the power of grouping by shared context to generate new patterns. New patterns which were still meaningful (still meaningful, because "meaning" was conferred by sharing prediction, not any arbitrary external category, like "cats".) > Modern LLMs and the top compressors like NNCP use transformers, which > model lateral inhibition and feedback like real brains, rather than just a > feed forward network. > They use "feedback like real brains"? Again, you're presenting this like settled science. No-one knows how brains generate structure. Just having feedback says nothing. A badly tuned stereo amp has feedback. > The evidence for biological plausibility is that all of this works, even > down to making the sae kinds of mistakes as humans > Yes, distributed representation works. And compression gets you some patterns. Just as cats in 2010 gave us some patterns. But there are many things which are inconsistent with a compression model. Not least that LLMs get enormously big! (Use enormous power, have enormous single training phases, can't learn worth a damn, can't create worth a damn, don't structure, depend on arbitrary assumptions of "token"...) Anyway, what do I think corresponds to LLM "weight" in contrast to spike rate? I think neural groupings, probably grouped by phase, or spike time, not spike rate. -R ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Ta9b77fda597cc07a-Mc260618fa32b7c227ee2812c Delivery options: https://agi.topicbox.com/groups/agi/subscription
