Someone else recognized the truth long ago: https://www.reddit.com/r/MachineLearning/comments/47j8j6/is_deep_learning_a_markov_chain_in_disguise/
If you take a 500 dimensional dataset, as Juan showed in that refreshing article, they say, given a new data point unseen, how can you learn general patterns in the high D space without overfitting either. In a ex. 2D space we may have a few blue dots surrounded by red dots, but the thing here is....after you do your pattern checking on the unseen point, you can then determine where in that 2D space it is...is it inside the blue ball zone, or not? Or maybe is both blue and red? It may look like both a rat and a lion after all. But the thing here is is, it is not all just tweak the weights until it gives you those patterns, else Transformers wouldn't need positional coding, embedding like Glove/Word2Vec/Seq2Vec, or self-attention! Or BPE, Normalization, all the things mine has is same too just no curve manipulation going on. I still need pooling activation function BTW and weights, but it's as esy to understand as, a markov chain. At best, backpropagation is just an optimization to make HHMMs faster, it can't be a new way to find patterns, all patterns start at exact matches, and clearly it is doing all the things mine does. I'm not one bit interested therefore in the curve manipulation in backprop/ the net, i'm only interested in the pattern keys/ organs of it all - BPE, self-attention, embed relations, normalization, pooling energy, all these things that actually look for patterns in data. Backprop is not part of AI, it's a shirt on a body, not the muscles. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Ta86fa089ebd8ca28-Mbc756438f56627aebb9a8c2a Delivery options: https://agi.topicbox.com/groups/agi/subscription
