On Wednesday, June 29, 2022, at 10:29 AM, Rob Freeman wrote: > You would start with the relational principle those dot products learn, by > which I mean grouping things according to shared predictions, make it instead > a foundational principle, and then just generate groupings with them.
Isn't that what backprop does anyway? What transformers do, as I understand, is using that very same backprop to learn more general embeddings, before learning case-specific soft weights. That's pretraining. So the learning is split in two mechanically very similar stages, more general and less general. That principle can be extended to multi-stage hierarchical pretraining, forming embeddings of embedding, etc. So it's general per se, but still perceptron + backprop at the core, which I think is horribly wasteful. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-Mb0f3ecc8c16f5fde4577c176 Delivery options: https://agi.topicbox.com/groups/agi/subscription