All this discussion of building a grammar seems to ignore the obvious fact
that in humans, language learning is a continuous process that does not
require any explicit encoding of rules.  I think either your model should
learn this way, or you need to explain why your model would be more successful
by taking a different route.  Explicit encoding of grammars has a long history
of failure, so your explanation should be good.  At a minimum, the explanation
should describe how humans actually learn language and why your method is
better.

Natural language has a structure that allows it to be learned in the same
order that children learn: lexical, semantics, grammar.  Artificial language
lacks this structure.

1. Lexical: word boundaries occur where the mutual information between n-grams
(phoneme or letter sequences) on opposite sides is smallest.  Words have a
Zipf distribution, so that the vocabulary grows at a constant rate.

2. Semantics: words with related meanings are more likely to co-occur within a
small time window.

3. Grammar: words of the same type (part of speech) are more likely to occur
in the same immediate context.

The problem with statistical models trained on text is that the semantics is
not grounded.  A model can learn associations like rain...wet...water, but
does not associate these words with sensory or motor I/O as humans do.  So
your language model might pass a text compression test or a Turing test, but
would still lack the knowledge needed to integrate it into a robot.

Some have argued that this is a good enough reason to code knowledge
explicitly (i.e. expert systems, Cyc), but I don't buy it.  Where is the
mechanism for updating the knowledge base during a conversation?

Some have argued that we should use an artificial or simplified language to
make the problem easier, but I don't buy it.  Artificial languages are
designed to be processed in the wrong order: lexical, grammar, semantics.  How
do you transition to natural language?  You cannot parse natural language
without knowing the meanings of the words.  You would have avoided that
problem if you learned the meanings first, before learning the grammar.


-- Matt Mahoney, [EMAIL PROTECTED]

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=84568500-05d38c

Reply via email to