Ben, I was using linear in two senses. One Bengio's original NLM where word encodings were devoid of context. The other the sense Goodfellow uses it in this lecture:
Do statistical models understand the world? Ian Goodfellow https://www.youtube.com/watch?v=hDlHpBBGaKs&t=19m5s "Modern deep nets are very (piecewise) linear" I doubt this new work addresses that problem. Does it? That was the second sense of linear I was using. For the first, context free language model, sense of linear, this BERT architecture may capture some agreement phenomena, even dependency phenomena (second paper?) Is this the "spotlight" attention mechanism linking coding to context? But where is the model for the meaning of a combination of words. Or if meaning is too much, the model for how words combine into hierarchy? It all comes down to a principled theory for combining words. This as opposed to just learning as many combination patterns as possible. Deep learning is locked in the paradigm of learning as much as possible. Like studying for a test by learning all the answers, rather than understanding principles which allow you to work out your own answers. That it only learns patterns, and does not have a principle by which new patterns can be created, is why we are trapped needing ever bigger data sets, eternally getting better, but never becoming good enough. More and more agreement, more and more dependency, but never all agreement or all dependency. However much data you learn, it will never enough, because you can never learn the novelty which comes from a principle by which new patterns (let alone meaning) can be created. That's what this work is lacking. -Rob On Sun, Feb 17, 2019 at 2:08 PM Ben Goertzel <b...@goertzel.org> wrote: > Rob, > > These deep NNs certainly are not linear models, and they do capture a > bunch of syntactic phenomena fairly subtly, see e.g. > > https://arxiv.org/abs/1901.05287 > > "I assess the extent to which the recently introduced BERT model > captures English syntactic phenomena, using (1) naturally-occurring > subject-verb agreement stimuli; (2) "coloreless green ideas" > subject-verb agreement stimuli, in which content words in natural > sentences are randomly replaced with words sharing the same > part-of-speech and inflection; and (3) manually crafted stimuli for > subject-verb agreement and reflexive anaphora phenomena. The BERT > model performs remarkably well on all cases." > > This paper shows some dependency trees implicit in transformer networks, > > http://aclweb.org/anthology/W18-5431 > > This stuff is not AGI and does not extract deep semantics nor do > symbol grounding etc. For sure it has many limitations. Bu it's > also not so trivial as you're suggesting IMO... > > -- Ben G > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T581199cf280badd7-Mbe9f72d457a0326b7c6610f3 Delivery options: https://agi.topicbox.com/groups/agi/subscription