All this discussion of building a grammar seems to ignore the obvious fact that in humans, language learning is a continuous process that does not require any explicit encoding of rules. I think either your model should learn this way, or you need to explain why your model would be more successful by taking a different route. Explicit encoding of grammars has a long history of failure, so your explanation should be good. At a minimum, the explanation should describe how humans actually learn language and why your method is better.
Natural language has a structure that allows it to be learned in the same order that children learn: lexical, semantics, grammar. Artificial language lacks this structure. 1. Lexical: word boundaries occur where the mutual information between n-grams (phoneme or letter sequences) on opposite sides is smallest. Words have a Zipf distribution, so that the vocabulary grows at a constant rate. 2. Semantics: words with related meanings are more likely to co-occur within a small time window. 3. Grammar: words of the same type (part of speech) are more likely to occur in the same immediate context. The problem with statistical models trained on text is that the semantics is not grounded. A model can learn associations like rain...wet...water, but does not associate these words with sensory or motor I/O as humans do. So your language model might pass a text compression test or a Turing test, but would still lack the knowledge needed to integrate it into a robot. Some have argued that this is a good enough reason to code knowledge explicitly (i.e. expert systems, Cyc), but I don't buy it. Where is the mechanism for updating the knowledge base during a conversation? Some have argued that we should use an artificial or simplified language to make the problem easier, but I don't buy it. Artificial languages are designed to be processed in the wrong order: lexical, grammar, semantics. How do you transition to natural language? You cannot parse natural language without knowing the meanings of the words. You would have avoided that problem if you learned the meanings first, before learning the grammar. -- Matt Mahoney, [EMAIL PROTECTED] ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244&id_secret=84568500-05d38c