Ben,

I was using linear in two senses. One Bengio's original NLM where word
encodings were devoid of context. The other the sense Goodfellow uses it in
this lecture:

Do statistical models understand the world? Ian Goodfellow
https://www.youtube.com/watch?v=hDlHpBBGaKs&t=19m5s

"Modern deep nets are very (piecewise) linear"

I doubt this new work addresses that problem. Does it?

That was the second sense of linear I was using. For the first, context
free language model, sense of linear, this BERT architecture may capture
some agreement phenomena, even dependency phenomena (second paper?) Is this
the "spotlight" attention mechanism linking coding to context?

But where is the model for the meaning of a combination of words. Or if
meaning is too much, the model for how words combine into hierarchy? It all
comes down to a principled theory for combining words. This as opposed to
just learning as many combination patterns as possible.

Deep learning is locked in the paradigm of learning as much as possible.
Like studying for a test by learning all the answers, rather than
understanding principles which allow you to work out your own answers. That
it only learns patterns, and does not have a principle by which new
patterns can be created, is why we are trapped needing ever bigger data
sets, eternally getting better, but never becoming good enough. More and
more agreement, more and more dependency, but never all agreement or all
dependency. However much data you learn, it will never enough, because you
can never learn the novelty which comes from a principle by which new
patterns (let alone meaning) can be created.

That's what this work is lacking.

-Rob

On Sun, Feb 17, 2019 at 2:08 PM Ben Goertzel <b...@goertzel.org> wrote:

> Rob,
>
> These deep NNs certainly are not linear models, and they do capture a
> bunch of syntactic phenomena fairly subtly, see e.g.
>
> https://arxiv.org/abs/1901.05287
>
> "I assess the extent to which the recently introduced BERT model
> captures English syntactic phenomena, using (1) naturally-occurring
> subject-verb agreement stimuli; (2) "coloreless green ideas"
> subject-verb agreement stimuli, in which content words in natural
> sentences are randomly replaced with words sharing the same
> part-of-speech and inflection; and (3) manually crafted stimuli for
> subject-verb agreement and reflexive anaphora phenomena. The BERT
> model performs remarkably well on all cases."
>
> This paper shows some dependency trees implicit in transformer networks,
>
> http://aclweb.org/anthology/W18-5431
>
> This stuff is not AGI and does not extract deep semantics nor do
> symbol grounding etc.   For sure it has many limitations.   Bu it's
> also not so trivial as you're suggesting IMO...
>
> -- Ben G
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-Mbe9f72d457a0326b7c6610f3
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to