On Wednesday, June 29, 2022, at 10:29 AM, Rob Freeman wrote:
> You would start with the relational principle those dot products learn, by 
> which I mean grouping things according to shared predictions, make it instead 
> a foundational principle, and then just generate groupings with them.

Isn't that what backprop does anyway?

What transformers do, as I understand, is using that very same backprop to 
learn more general embeddings, before learning case-specific soft weights. 
That's pretraining. So the learning is split in two mechanically very similar 
stages, more general and less general. 

That principle can be extended to multi-stage hierarchical pretraining, forming 
embeddings of embedding, etc. So it's general per se, but still perceptron + 
backprop at the core, which I think is horribly wasteful.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-Mb0f3ecc8c16f5fde4577c176
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to